Sas 使用delete和insert更新历史记录表_Sas

Sas 使用delete和insert更新历史记录表

sas

Sas 使用delete和insert更新历史记录表,sas,Sas,如果temp中的id与hist中的id匹配，则从hist中删除该行并从temp中插入该行；如果该id与hist中的任何行不匹配，则将该行附加到hist中。我有两个列相同的数据集： data hist; input id1 id2 var1 $; cards; 1 10 a 2 20 b 3 30 c 4 40 d 5 50 e ; run; data temp; input id1 id2 var1 $; cards; 2 20 b 3 30 d 4 40 e 5 50 f 6 60 g ;

如果temp中的id与hist中的id匹配，则从hist中删除该行并从temp中插入该行；如果该id与hist中的任何行不匹配，则将该行附加到hist中。我有两个列相同的数据集：

data hist;
input id1 id2 var1 $;
cards;
1 10 a
2 20 b
3 30 c 
4 40 d
5 50 e
;
run;
data temp;
input id1 id2 var1 $;
cards;
2 20 b
3 30 d
4 40 e
5 50 f
6 60 g
;
run;

temp

将具有当前记录，

历史记录

将具有所有历史记录行

如果

temp

（更新）中存在一行，我想在

历史记录

数据集中删除并插入该行。。如果

history

中不存在来自

temp

的行，则向

history

数据集中追加一行。

历史记录

数据集将至少有100个工厂记录。从上面的输入，我想要这样的输出

temp

中的第1、2、3、4行与

历史记录中的行相匹配，因此它们将被更新，temp
中的第5行不匹配，因此它将被追加到历史记录中
很抱歉之前的混乱。我想现在应该清楚了。
谢谢
Sam.
实现您所描述的功能的一种方法是SQL中的union
<代码>联合
默认情况下不追加重复记录。但是，它确实需要一些时间（因为它必须识别这些记录）
如果您有足够的内存将键加载到内存中的哈希表中的历史
，那么这可能是最快的选择。将history
加载到哈希中，设置temp，find（）
当前行，如果未找到，则将该行添加到哈希中。然后，在最后，将散列输出回历史
根据temp和history的相对大小，您还可以只将要添加的行输出到数据集，而不是将它们添加到散列中，然后proc append
该数据集
如果temp
小于四分之一左右，则history
的大小可能是更好的选择
data temp_to_Add;
  set temp;
  if _n_=1 then do;
    declare hash h(dataset:'history');
    h.defineKey('keyvars');
    h.defineDone();
  end;
  rc = h.find();
  if rc ne 0 then output;
run;

如果需要对照自身检查temp
，也可以在rc ne 0
时将节点添加到哈希中。有一种方法可以让SAS和PROC APPEND为您执行此操作
因此，在不了解数据列的情况下，我将进行一般性讨论。我假设您有一个或多个字段定义唯一性
首先，创建一个关于历史的唯一索引
proc sql;
create unique index hist_unq on HISTORY(col1, col2, ...);
quit;

然后使用PROC APPEND：
proc append base=history data=temp force;
run;

您将在日志中看到一条警告，并注意到附加的数据少于总数。比如：
NOTE: Appending WORK.TEMP to WORK.HISTORY.
WARNING: Duplicate values not allowed on index hist_unq for file HISTORY, 36 observations rejected.
NOTE: There were 70 observations read from the data set WORK.TEMP.
NOTE: 34 observations added.
NOTE: The data set WORK.HISTORY has 144 observations and 2 variables.
NOTE: PROCEDURE APPEND used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

我认为DomPazz的简单性是迄今为止最好的答案，但如果您无法方便地在历史记录上定义唯一索引，或者您真的想避免任何警告消息，那么下面更复杂的数据步方法可以工作。它应该几乎与proc append
一样快，同时避免了Joe提出的hash对象方法的内存和CPU需求
注意：尽管这不需要对历史记录
进行唯一索引，但如果temp
中任何匹配id的行数比历史记录
中的多，它将追加temp
中不需要的行数
data history;
input id var1 $;
cards;
1 a
2 b
3 c 
4 d
5 e
5 f
;
run;

data temp;
input id var1 $;
cards;
3 d
4 e
5 f
6 g
6 h
;
run;

proc datasets lib = work nolist;
    modify history;
    index create id;
    run;
quit;

data history;
    set temp;
    modify history key = id;
    if _iorc_ ne 0 then do;
        _ERROR_ = 0;
        output;
    end;
run;

工作原理：
从temp
读取记录（第一个set语句）
尝试使用匹配的id
值从history
读取第一条记录
如果没有找到匹配项，则输出新记录
由于我们从未从历史
中读取任何来自temp
的不匹配id
s的一行，因此当我们在步骤1中从temp
中读取所有其他变量的值时，这些变量的值仍然存在于PDV中
history
的索引在数据步骤完成添加/修改/删除行之后才会更新，因此对于temp
的最后一行，即使我们已经向history
添加了一行id=6，我们也不会在同一数据步骤的后续迭代中通过索引找到它，所以这两行都被添加了
编辑：使用匹配ID更新历史记录的替代版本：
data history;
    set temp(rename = (var1 = new_var1));
    do _n_ = 1 by 1 until(eof);
        modify history key = id end = eof;
        if _iorc_ = 0 then do;
            var1 = new_var1;
            replace;
        end;
        else do;
            _ERROR_ = 0;
            if not(eof and _n_ > 1) then output;
        end;        
    end;
run;

这里的一个缺点是您必须重命名temp
中的所有非id变量，因为当modify
语句从history
读取一行时，它会覆盖PDV中具有相同名称的变量。如果您对temp
和history
的id都有唯一的索引，则可以这样避免：
data history;
    set temp(keep = id);
    modify history key = id;
    if _iorc_ = 0 then do;
        set temp key = id;
        replace;
    end;
    else do;
        _ERROR_ = 0;
        output;
    end;        
run;

如果从第一次重写的history
中读取了一条匹配的记录，那么额外的set语句将再次从temp
中读取相关记录。
再次击败我！随着历史的发展，真正的问题将是性能。我实际上考虑了反向设置历史，hash temp，但是append解决方案可能比我想象的要快，不是吗？或者您可以使用modify
使其成为更好的选项吗？（而且，在一个小的temp
表中查找每个history
行会不会太贵？如果你有一个相当简单的方法来过滤掉大部分history
，比如日期边界之类的话，这可能是值得的。）我会在history和PROC APPEND上使用一个唯一的键。我不确定在大量的记录上，什么是最快的方法。我添加了一个基于modify
的答案。啊，很有趣。我不知道SAS在这种情况下会给你一个警告，我原以为会出错。在我的机器上，附加了几千行（另外几千行作为DUP抛出）创建1.5G历史记录表所需时间不到半秒。您是否考虑了索引创建时间？您还可以将nowarn
选项与proc append
结合使用force。这也将删除警告消息。我通常不主张删除警告消息，但这是其中一个例外，它是相当无害的（至少在我的经验中）。如果有bug，很容易解决。@RobertPenridge，实际上是NOWARN选项
data history;
    set temp(keep = id);
    modify history key = id;
    if _iorc_ = 0 then do;
        set temp key = id;
        replace;
    end;
    else do;
        _ERROR_ = 0;
        output;
    end;        
run;