Sas 列中的差异(按id)和根据条件跳过行

Sas 列中的差异(按id)和根据条件跳过行,sas,Sas,我想计算数字列中给定年份的结束和下一年的开始之间的差异。下表 +----+------------+--------+ | id | year | number | +----+------------+--------+ | 1 | 22Feb2008 | 1 | | 1 | 8Aug2008 | 2 | | 1 | 1Jan2009 | 3 | | 1 | 5Dec2009 | 8 | | 1 | 2March201

我想计算数字列中给定年份的结束和下一年的开始之间的差异。下表

+----+------------+--------+
| id |    year    | number |
+----+------------+--------+
|  1 | 22Feb2008  |      1 |
|  1 | 8Aug2008   |      2 |
|  1 | 1Jan2009   |      3 |
|  1 | 5Dec2009   |      8 |
|  1 | 2March2010 |      2 |
|  1 | 10Dec2010  |      5 |
|  1 | 5Jan2011   |     12 |
|  1 | 7Nov2011   |      9 |
|  2 | 6Feb2005   |      8 |
|  2 | 8Nov2005   |     12 |
|  2 | 7Apri2006  |      5 |
|  2 | 8Dec2006   |      4 |
+----+------------+--------+
应该像

+----+-----------+--------------+
| id |   year    |    change    |
+----+-----------+--------------+
|  1 | 8Aug2008  | from 2 to 3  |
|  1 | 5Dec2009  | from 8 to 2  |
|  1 | 10Dec2010 | from 5 to 12 |
|  2 | 8Nov2005  | from 12 to 5 |
+----+-----------+--------------+
,,
其中,变化是一个特征。对于每个id变量,给定年份中有1个或2个日期。我希望将从时段t-1到t的更改粘贴到时段t-1的行中。因为我只关心从一年的第二个年末日期到下一年的第一个年初日期的变化,所以我不使用2008年的开始日期和2011年的结束日期

data HAVE_VW / view=HAVE_VW; /* you donot have to make this a view, but it saves time and space */
    set HAVE;
    by id year; /* This is not needed, but if your input data is not sorted as assumed, this will at least give an error */

    year_nr = year(year);
run;



data WANT;
    set HAVE_VW;
    by id year_nr;


    /* remember the number at the beginning of the year */
    if first.year_nr then begin_number = number;
    retain begin_number; /* a variable that is not retained is initialized for each observation (aga row) */

    if last.year_nr then do;
        change = compbl('from '||put(begin_number, 8.)||' to '||put(number, 8.)); /* calculate the change */

        output; /* If you add an output statement all observations for which you do not execute an output are deleted */
    end;
run;
拥有:

解决方案:如果之前数据排序正确

data want(drop=number start);
   set have;
   by id;
   length change $100;
   retain start;
   if first.id then start=.;
   if index(strip(year),'end') then start = number;
   else if index(strip(year),'start') and start ne . then do;
      change = "from "|| strip(start) || " to " || strip(number);
      output;
   end;
run;
输出:


你可以这样做。请注意,如果实际数字较大,则必须进行一些调整。假设数据集在每个主题内按日期排序。还请注意,这给出了数据集中从一年年底到下一年年初的变化,无论这些变化是否持续

data input;
    input id $ 1-1 @3 date date9.  number;
    format date ddmmyy10.;
    numyear=year(date);
datalines;
1 01JAN2008 1
1 31DEC2008 2
1 02FEB2009 3
1 31DEC2009 8
1 01JAN2010 2
1 02JAN2010 5
1 01JAN2011 12
1 31DEC2011 9
2 01JAN2005 8
2 31DEC2005 12
2 01JAN2006 5
2 31DEC2006 4
;

data output;
    set input;
        by id numyear;
    retain year change0 ;
    /*  Reaching the end of a year, store some values that are retained to the next lines*/
    if last.numyear then do;
        year=cat("end ",put(numyear, 4.0));
        change0=cat("from ", strip(put(number,3.)));
    end;
    /*  Starting a new year, combine retained values with new ones, and output.*/
    if first.numyear and not first.id then do;
        change=cat(strip(change0), " to ", strip(put(number,3.)));
        /*  Comment out the following lines will make it easier to follow the logic of the program*/
        keep id year change;
        output;
    end;
run;

使用与上一个问题相同的方法。 只有这一次测试年份的变化yeardate ne lagyeardate,以决定输出哪个观测值

data have;
  input id date number ;
  informat date date.;
  format date date9.;
cards;
1 22Feb2008  1
1  8Aug2008  2
1  1Jan2009  3
1  5Dec2009  8
1  2Mar2010  2
1 10Dec2010  5
1  5Jan2011 12
1  7Nov2011  9
2  6Feb2005  8
2  8Nov2005 12
2  7Apr2006  5
2  8Dec2006  4
;

proc sort data=have out=want ;
  by id descending date ;
run;

data want ;
  set want;
  by id ;
  change=catx(' to ',number,lag(number));
  if year(date) ne lag(year(date)) and not first.id then output;
run;

proc sort; by id date; run;
proc print; run;
结果:

Obs    id         date    number    change

 1      1    08AUG2008       2      2 to 3
 2      1    05DEC2009       8      8 to 2
 3      1    10DEC2010       5      5 to 12
 4      2    08NOV2005      12      12 to 5

年份变量的格式是什么?实际值是什么?它们的格式是日期。例如2019年8月22日,不一定是2019年12月31日。一年的结束比一年的开始晚,我很抱歉。显然这个问题措词不当。我能做些什么来改善选民的情绪?事情不清楚吗?这个问题对这个网站来说不合适/不容易吗?当然,重新表述一下:例如,用实际值替换象征性的2008年开始和2008年结束。并解释为什么您不需要id=1,year=2011和id=2,year=2016的数据计算出的差异是什么?“年度”列中的日期值是否在组id中无序排列?“年”是否总是只包含两个日期?这与您前面的问题非常相似,该问题的答案演示了如何利用下一行的“潜在客户”数据。
data have;
  input id date number ;
  informat date date.;
  format date date9.;
cards;
1 22Feb2008  1
1  8Aug2008  2
1  1Jan2009  3
1  5Dec2009  8
1  2Mar2010  2
1 10Dec2010  5
1  5Jan2011 12
1  7Nov2011  9
2  6Feb2005  8
2  8Nov2005 12
2  7Apr2006  5
2  8Dec2006  4
;

proc sort data=have out=want ;
  by id descending date ;
run;

data want ;
  set want;
  by id ;
  change=catx(' to ',number,lag(number));
  if year(date) ne lag(year(date)) and not first.id then output;
run;

proc sort; by id date; run;
proc print; run;
Obs    id         date    number    change

 1      1    08AUG2008       2      2 to 3
 2      1    05DEC2009       8      8 to 2
 3      1    10DEC2010       5      5 to 12
 4      2    08NOV2005      12      12 to 5