Hash 哈希合并宏-使用文件记录指示符;散列+;点=关键点;
正在将此宏更新为哈希+点=键。我们的一次数据运行的当前版本的宏已经开始超出内存限制。我之所以寻求帮助,是因为我没有太多时间,也从来没有真正分析过这段代码,因为直到最近它才成为我流程的一部分 我真正不理解的是RID是如何设置的,以及如何将其合并到我们的宏中。实际上,我甚至不知道用我们当前的宏是否可以这样做 任何帮助都将不胜感激Hash 哈希合并宏-使用文件记录指示符;散列+;点=关键点;,hash,merge,sas,sas-macro,Hash,Merge,Sas,Sas Macro,正在将此宏更新为哈希+点=键。我们的一次数据运行的当前版本的宏已经开始超出内存限制。我之所以寻求帮助,是因为我没有太多时间,也从来没有真正分析过这段代码,因为直到最近它才成为我流程的一部分 我真正不理解的是RID是如何设置的,以及如何将其合并到我们的宏中。实际上,我甚至不知道用我们当前的宏是否可以这样做 任何帮助都将不胜感激 %macro hashmerge2(varnm,onto,from,byvars,obsqty); %let data_vars = %trim (&va
%macro hashmerge2(varnm,onto,from,byvars,obsqty);
%let data_vars = %trim (&varnm);
%let data_vars_a = %sysfunc(tranwrd(&data_vars.,%str( ),%str(" , ")));
%let data_vars_b = %sysfunc(tranwrd(&data_vars.,%str( ), %str(,)));
%let data_key = %trim (&byvars);
%let data_key = %sysfunc(tranwrd(&data_key.,%str( ), %str(" , ")));
%if %index(&varnm,' ') > 0 %then %let varnm3=%substr(%substr(&varnm,1,%index(&varnm,' ')),1,4);
%else %let varnm3=%substr(&varnm,1,4);
data &onto(drop=rc) miss&varnm3(drop=rc);
if 0 then set &onto &from(keep=&varnm. &byvars.);
declare hash h_merge (dataset: "&from.");
rc = h_merge.DefineKey ("&data_key.");
rc = h_merge.DefineData ("&data_vars_a.");
rc = h_merge.DefineDone ();
do until (eof);
set &onto end = eof;
call missing(&data_vars_b.);
rc = h_merge.find ();
if rc = 0 then do;
output &onto;
from = "&from.";
end;
else do;
output miss&varnm3 &onto;
from = "&onto.";
end;
end;
stop;
run;
%mend;
所以我认为这就是您正在寻找的,但是它仍然需要将“lookup”表中的所有键值加载到hash对象中。但它可以节省空间,不必同时加载非关键变量,只需加载与关键变量匹配的观察值即可
%macro hash_merge_point
/*-----------------------------------------------------------------------------
Merge variables ONTO large table FROM small table using POINT= dataset option.
-----------------------------------------------------------------------------*/
(varnm /* Space delimited list of variable to retrieve */
,onto /* Dataset to update */
,from /* Dataset to get values from */
,byvars /* Space delimited list of key variables to match on */
);
%local missds key_vars;
%let missds=%scan(&varnm,1,%str( ));
%let missds=miss%substr(&missds,1,%sysfunc(min(28,%length(&missds))));
%let key_vars="%sysfunc(tranwrd(%sysfunc(compbl(&byvars)),%str( )," "))";
data &onto(drop=rc) &missds(drop=rc);
if 0 then set &onto &from(keep=&varnm. &byvars.);
declare hash h_merge ();
rc = h_merge.DefineKey (&key_vars);
rc = h_merge.DefineData ('_point');
rc = h_merge.DefineDone ();
do _point=1 to _nobs;
set &from(keep=&byvars) point=_point nobs=_nobs;
rc = h_merge.add();
end;
do until (eof);
set &onto end = eof;
rc = h_merge.find ();
if rc = 0 then do;
set &from (keep=&varnm) point=_point;
from = "&from.";
output &onto;
end;
else do;
call missing(of &varnm);
from = "&onto.";
output ;
end;
end;
stop;
run;
%mend hash_merge_point;
下面是一个简单的例子:
data lookup;
input id age sex $1.;
cards;
1 10 F
2 20 .
4 30 M
;
data master ;
input id wt ;
cards;
1 100
2 150
3 180
4 200
;
%hash_merge_point
/*-----------------------------------------------------------------------------
Merge variables ONTO large table FROM small table using POINT= dataset option.
-----------------------------------------------------------------------------*/
(varnm=age sex /* Space delimited list of variable to retrieve */
,onto=master /* Dataset to update */
,from=lookup /* Dataset to get values from */
,byvars=id /* Space delimited list of key variables to match on */
);
如果目标表已包含合并创建的变量(因此您只想覆盖当前值),则可以使用MODIFY语句而不是SET语句就地修改数据集。但是,在尝试此操作之前,您可能需要确保有表的备份。还要注意的是,如果要为源标记from
变量,那么该变量也需要存在
因此,使用此更新的主表:
data master ;
input id wt ;
length age 8 sex $1 from $50;
cards;
1 100
2 150
3 180
4 200
;
此版本的宏:
%macro hash_merge_point
/*-----------------------------------------------------------------------------
Merge variables ONTO large table FROM small table using POINT= dataset option.
-----------------------------------------------------------------------------*/
(varnm /* Space delimited list of variable to retrieve */
,onto /* Dataset to update */
,from /* Dataset to get values from */
,byvars /* Space delimited list of key variables to match on */
);
%local key_vars;
%let key_vars="%sysfunc(tranwrd(%sysfunc(compbl(&byvars)),%str( )," "))";
data &onto;
if 0 then set &onto (keep=&byvars.);
declare hash h_merge ();
rc = h_merge.DefineKey (&key_vars);
rc = h_merge.DefineData ('_point');
rc = h_merge.DefineDone ();
do _point=1 to _nobs;
set &from(keep=&byvars) point=_point nobs=_nobs;
rc = h_merge.add();
end;
do until (eof);
modify &onto end = eof;
rc = h_merge.find ();
if rc = 0 then do;
set &from (keep=&varnm) point=_point;
from = "&from.";
end;
else from = "&onto.";
replace;
end;
stop;
run;
%mend hash_merge_point;
如果运行此代码:
proc print data=master;
title 'BEFORE';
run;
%hash_merge_point
/*-----------------------------------------------------------------------------
Merge variables ONTO large table FROM small table using POINT= dataset option.
-----------------------------------------------------------------------------*/
(varnm=age sex /* Space delimited list of variable to retrieve */
,onto=master /* Dataset to update */
,from=lookup /* Dataset to get values from */
,byvars=id /* Space delimited list of key variables to match on */
);
proc print data=master;
title 'AFTER';
run;
您将得到以下结果:
开始输入和结束输出是什么。也许有人能帮你更好。尽管散列技术很好,但由于散列表存储在内存中,它成为散列技术的主要限制。我们将8个变量从一个约1560万行的数据集中附加到一个约570万行的数据集中,这两个数据集都随着时间的推移而增加。我们的键是char变量和date变量的唯一组合。输出将被输入到数据集。我们有另一种方法,is运行良好,但速度较慢。我们在其他步骤中也使用此宏,但要求不高,也没有相同的问题。有多少关键变量?这将决定哈希表需要多少内存。只有两个关键变量,一个是长度为15的字符串,另一个是date8中的日期。格式。谢谢你的输入,我想我明白了。我现在正在测试,如果我有任何问题,我会告诉你!