I think the confusion is coming from the (mistaken) idea that, e.g., FIRST.VAR is the first *value* of VAR within a group -- so for example, if the first value of SEX is 1, then first.VAR would be equal to 1 when var=sex. And if the first value of age were 38, then first.VAR would equal 38 when var=age.
But that is NOT what first.VAR or last.VAR are doing at all. first.VAR is simply an indicator of whether or not we are at the first row within that BY group -- 1 if true, 0 if false. So first.VAR is a separate variable.
It is also tied to the BY statement -- first.<variable> and last.<variable> do not exist if you do not have a corresponding BY statement. For example, let's say you have the following dataset, which we sort by state, and within state, by year, and within year, by month.
That allows us to have a BY statement that says BY state year month; (order has to match the sorting). In order to see the values of first.state, first.year and first.month, we're just saving them in permanent variables (normally, there's not a need to do this - you can just use the first. or last. variables directly and be done with them).
data test;
infile cards dsd truncover firstobs=1 dlm=',';
length state $2 year month 3 rate 8;
input state year month rate;
cards;
NY,2020,1,0.37
NY,2020,2,0.39
NY,2020,3,0.32
NY,2020,4,0.36
NY,2021,1,0.09
NY,2021,2,0.14
NY,2021,3,0.14
NY,2021,4,0.20
MA,2020,1,0.29
MA,2020,2,0.41
MA,2020,3,0.35
MA,2020,4,0.33
MA,2021,1,0.18
MA,2021,2,0.17
MA,2021,3,0.22
MA,2021,4,0.27
;
run;
proc sort data=test; by state year month; run;
data test2;
set test;
by state year month;
is_first_dot_state=first.state;
is_first_dot_year=first.year;
is_first_dot_month=first.month;
run;
proc print data=test2 noobs; run;
Result is the following:
... View more