Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/date/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/EmptyTag/128.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Date 第一。最后。多次发生_Date_If Statement_Sas - Fatal编程技术网

Date 第一。最后。多次发生

Date 第一。最后。多次发生,date,if-statement,sas,Date,If Statement,Sas,我正在尝试使用SAS中的first.variable和last.variable从数据集收集日期。以下是创建可复制示例的代码: data example; infile datalines delimiter = ","; input id $ code $ valid_from valid_to; format valid_from IS8601DA10. valid_to IS8601DA10.; datalines; 1A,ABC,20058,20177 1A,DEF,

我正在尝试使用SAS中的first.variable和last.variable从数据集收集日期。以下是创建可复制示例的代码:

data example;
infile datalines delimiter = ",";
input id $ code $ valid_from valid_to;
format valid_from IS8601DA10. valid_to IS8601DA10.;
datalines;
1A,ABC,20058,20177
1A,DEF,20178,20481
1A,DEF,20482,20605
1A,DEF,20606,21548
1A,DEF,21549,21638
1A,DEF,21639,21729
1A,ABC,21730,21733
1A,ABC,21734,21808
1B,MNO,20200,20259
1B,PQR,20260,20269
1B,STU,20270,20331
1B,VWX,20332,20361
1B,VWX,20362,22108
1B,VWX,22109,22164
1B,VWX,22165,22165
1B,VWX,22166,2936547
;
run;
我们的想法是,对于每个
id
,每个
code
只获得一个观察值,以及它所涵盖的相应日期范围

这是我的密码:

proc sort data=example out=example_sorted; by code valid_from; run;

data collapse_val_dates; 
set example_sorted;
by code valid_from;

if first.code = 1 and last.code = 1 then do;
    output;
end;
if first.code = 1 and last.code = 0 then do;
    hold = valid_from;
    retain hold;
    end;
if first.code = 0 and last.code = 1 then do;
        valid_from = hold;
        output;
    end;

drop hold;
run;
以下是结果(表格折叠日期):

它产生了我对
id=1B
的期望,但对
id=1A
却没有。事实上,由于
code=ABC
在开始时出现一次,在结束时出现两次,结果表将
valid\u from=2014-12-01

我想要的是来自
code=ABC
有效\u将是
2019-06-30
。换句话说,如果中间有(或多个)其他代码,我希望SAS“忘记”第一次出现的代码。最终的表格如下所示:

+----+------+------------+------------+
| id | code | valid_from |  valid_to  |
+----+------+------------+------------+
| 1A | DEF  | 2015-03-31 | 2019-06-29 |
| 1A | ABC  | 2019-06-30 | 2019-09-16 |
| 1B | MNO  | 2015-04-22 | 2015-06-20 |
| 1B | PQR  | 2015-06-21 | 2015-06-30 |
| 1B | STU  | 2015-07-01 | 2015-08-31 |
| 1B | VWX  | 2015-09-01 | 9999-12-31 |
+----+------+------------+------------+

对于单次传递数据,您不能在串行处理组时输出
代码
的日期范围,因为组中稍后的行可能会覆盖所需的日期范围

你也需要这样做

  • 编码多个步骤,或
  • 执行一次传递并使用临时存储
假定
have
id-valid\u-from
排序,并且
valid\u-to
id
组中从不与后续
valid\u重叠

多个步骤

计算按连续
code
分组的行的
组号
,以用于最终排序

* multi step way;

data stage1;
  set have;
  by id code notsorted;
  if first.code then group_number+1;
run;

proc sort data=stage1 out=stage2;
  by id code group_number valid_from;
run;

* remember there can be multiple contiguous code groups within id & group;

data stage3;
  do until (last.code);
    set stage2;
    by id code group_number;
    if first.group_number then _start = valid_from;
    if last.code then do;
      valid_from = _start;
      OUTPUT;                /* date range for contiguous code group */
    end;
  end;
  drop _start;
run;

proc sort data=stage3 out=want(drop=group_number);
  by id valid_from;
run;
单程

DOW循环(其中包含
SET
语句的循环)可以计算组和子组的结果,并为每个组合输出一行。临时存储可以是散列(用于任意数量的子组),也可以是假定最大数量的子组的数组

例如:

固定大小
1000
的临时数组用于存储检查组时应修改的临时数据

* find the range of the dates from the last set of contiguous rows of a code within id;

data want(keep=id code valid_:);
  array dates (1000,2)   8 _temporary_; /* ,1 for _from and ,2 for _to */
  array codes (1000)   $50 _temporary_;
  array seq   (1000)     8 _temporary_; /* sequence for output order */

  * process the id group;
  do _n_ = 1 by 1 until (last.id);
    set have;
    by id code notsorted;

    * save start of date range in temporay storage;
    if first.code then do;

      * linear search for slot to use for subgroup code;
      do _index = 1 by 1 
        until (missing(codes(_index)) or codes(_index)=code);
      end;

      codes(_index) = code;
      dates(_index,1) = valid_from;
      seq  (_index) = _n_ + _index / 1000; * encode order value with lookup index;
    end;

    * save end of date range;
    if last.code then 
      dates(_index,2) = valid_to;
  end;

  *---;

  * process each code within id;

  call sort (of seq(*)); * order of first date for last code subgroup;

  do _index = 1 to dim(seq);
    if missing(seq(_index)) then continue;

    * extract encoded information;
    _ix = round((seq(_index) - int(seq(_index))) * 1000);

    code = codes(_ix);
    valid_from = dates(_ix,1);
    valid_to   = dates(_ix,2);

    OUTPUT;
  end;

  * clear out temporary arrays for next group processing;
  call missing (of dates(*), of codes(*), of seq(*));
run;

您是否正在尝试折叠重叠的间隔?或者只获取每个ID代码组合的最小和最大日期,无论间隔是否重叠?
* find the range of the dates from the last set of contiguous rows of a code within id;

data want(keep=id code valid_:);
  array dates (1000,2)   8 _temporary_; /* ,1 for _from and ,2 for _to */
  array codes (1000)   $50 _temporary_;
  array seq   (1000)     8 _temporary_; /* sequence for output order */

  * process the id group;
  do _n_ = 1 by 1 until (last.id);
    set have;
    by id code notsorted;

    * save start of date range in temporay storage;
    if first.code then do;

      * linear search for slot to use for subgroup code;
      do _index = 1 by 1 
        until (missing(codes(_index)) or codes(_index)=code);
      end;

      codes(_index) = code;
      dates(_index,1) = valid_from;
      seq  (_index) = _n_ + _index / 1000; * encode order value with lookup index;
    end;

    * save end of date range;
    if last.code then 
      dates(_index,2) = valid_to;
  end;

  *---;

  * process each code within id;

  call sort (of seq(*)); * order of first date for last code subgroup;

  do _index = 1 to dim(seq);
    if missing(seq(_index)) then continue;

    * extract encoded information;
    _ix = round((seq(_index) - int(seq(_index))) * 1000);

    code = codes(_ix);
    valid_from = dates(_ix,1);
    valid_to   = dates(_ix,2);

    OUTPUT;
  end;

  * clear out temporary arrays for next group processing;
  call missing (of dates(*), of codes(*), of seq(*));
run;