Sas 用最接近的时间戳连接两个数据集_Sas

Sas 用最接近的时间戳连接两个数据集

sas

Sas 用最接近的时间戳连接两个数据集,sas,Sas,我需要用壁橱的时间戳连接两张桌子 data a; input id name $5. timea time8.; format timea time5.; cards; 1 John 9:17 1 John 10:25 2 Chris 9:17 3 Emily 14:25 ;run; data b; input id name $5. timea time8.; format timeb time5.; cards; 1 John 9:00

我需要用壁橱的时间戳连接两张桌子

data a;
  input id name $5. timea time8.;
  format timea time5.;
  cards;
  1 John 9:17 
  1 John 10:25
  2 Chris 9:17 
  3 Emily 14:25
;run;

data b;
  input id name $5. timea time8.;
  format timeb time5.;
  cards;
  1 John 9:00 
  1 John 10:00
  2 Chris 9:00 
  3 Emily 14:30
;run;

Table Want: 
id name timea timeb 
1  John  9:17 9:30
1  John  10:25 10:00
2  Chris 9:17 9:00
3  Emily 14:25 14:30

我的方法是在表b中构建一个key=id | | name，按key排序，然后在表b中为每个时间戳创建一个间隔。在以下代码之后，我无法看到John的第一次

data time(rename=prev_TimeB = TimeB);
  length start_time end_time 8;
  retain start_time 0 prev_TimeB;
  set B(keep=TimeB) end = last;
  by key;
  if not first.key then do;
    end_time = TimeB - ((TimeB - prev_TimeB) / 2);
    output;
    prev_timeB = TimeB;
    if last.key then do;
    end_time = '23:59:59.999't;
    output;
  end;
  format prev_timeB start_time end_time time12.3;
  drop TimeB;
run;

感谢您的时间

如果对数据集A和B进行了排序，则可以将临时变量pos=n添加到两个表中：

Data a;
  set a;
  pos=_n_;
run;
Data b;
  set b;
  pos=_n_;
run;

您将拥有以下表格： id名称timea pos id名称timea pos 约翰福音9:17约翰福音9:00 1 约翰一世10:25约翰一世10:00 克里斯9:17 3克里斯9:00 3 艾米丽14:25 4艾米丽14:30 4

然后您可以在procsql语句中使用join

proc sql;
  create table result as
  select *
  from a t1
  left join b t2
    on t1.pos=t2.pos;
quit;

如果数据集未排序-首先按正确顺序对其进行排序

查找差异为最小绝对差异的记录。更容易在SAS中编码，因为它会自动将聚合函数值和详细记录重新合并

data a;
  input id name :$5. timea :time8.;
  format timea time5.;
cards;
1 John 9:17
1 John 10:25
2 Chris 9:17
3 Emily 14:25
4 Joe 11:21
;

data b;
  input id name :$5. timeb time8.;
  format timeb time5.;
cards;
1 John 9:00
1 John 10:00
2 Chris 9:00
3 Emily 14:30
;

proc sql ;
  create table C as
   select a.*
        , timeb
        , timea-timeb as seconds
        , abs(calculated seconds) as distance
   from a
   left join b
   on a.id = b.id and a.name = b.name
   group by a.id,a.name,a.timea
   having min(calculated distance) = calculated distance
  ;
quit;

结果

id    name     timea    timeb    seconds    distance
1    John      9:17     9:00      1020       1020
1    John     10:25    10:00      1500       1500
2    Chris     9:17     9:00      1020       1020
3    Emily    14:25    14:30      -300        300
4    Joe      11:21        .         .          .

如果您在其中一个输入数据集中切换两行的顺序，这将不起作用-您不是在最近的时间加入，而是在位置上加入。谢谢。但表a和表b中的实际obs不匹配。它可以是n对1的匹配。理想情况下，方法是通过查找最接近的时间和键来连接a和b。我不知道为什么上面的代码会在时间上覆盖first.keymimum绝对差？如果是领带，你想怎么办？保留多个记录或使用某种类型的分界线器？A中的记录怎么办？B中的记录怎么办？那些在B中但不在A中的记录呢？@Tom如果记录A根本不在B中，那么“无需连接”在表A和表B中有大量数据集。如果我采用这种方法，恐怕会占用太多的资源，而且要花很长时间。谢谢，多大？只有当每个id的平均次数很大时，才会出现问题。每个表中有数百万个OB。但你是对的。如果每个id的平均次数不太多，则不会花费太长时间。