String 基于另一个字符变量重新编码字符变量中的值';sas中的s值
String 基于另一个字符变量重新编码字符变量中的值';sas中的s值,string,sas,recode,String,Sas,Recode,jrnlfile是一个包含日志名称和标识符的数据集。以下是前6个OB: id journal issn 56201 ACTA HAEMATOLOGICA 0001-5792 94365 ACTA PHARMACOLOGICA SINICA 10334 ACTA PHARMACOLOGICA SINICA 1671-4083 55123 ADVANCES IN ENZYME REGUL
jrnlfile
是一个包含日志名称和标识符的数据集。以下是前6个OB:
id journal issn
56201 ACTA HAEMATOLOGICA 0001-5792
94365 ACTA PHARMACOLOGICA SINICA
10334 ACTA PHARMACOLOGICA SINICA 1671-4083
55123 ADVANCES IN ENZYME REGULATION 0065-2571
90002 AGING
10403 AGING 1945-4589
比较id
94365和10334。这些OB命名为相同的日志
。他们需要相同的issn
。缺少issn
值的obs几乎总是至少有一个合作伙伴obs,其中包含匹配的日志
名称和正确的issn
。如果这是真的,我想重新编码丢失的issn
,这样它就包含了在提到相同日志的其他实例中看到的issn
。修改后的数据集want
如下所示:
id journal issn
56201 ACTA HAEMATOLOGICA 0001-5792
94365 ACTA PHARMACOLOGICA SINICA 1671-4083
10334 ACTA PHARMACOLOGICA SINICA 1671-4083
55123 ADVANCES IN ENZYME REGULATION 0065-2571
90002 AGING 1945-4589
10403 AGING 1945-4589
我目前在数据步骤中使用if-else语句,用日记账
的匹配条目填充缺失的issn
值:
data want;
set jrnlfile;
if journal = "ACTA PHARMACOLOGICA SINICA" then issn = "1671-4083";
else if journal = "AGING" then issn = "1945-4589";
/*continue for 7,000 other journals*/
run;
但是jrnlfile
包含50000个OB和7000个独特的日志,因此这需要很多时间,而且很容易出错。让我半途而废,但是issn
不是数字,我无法通过简单地向它添加值来解决问题
从jrnlfile
获取want
的更有效、更系统的方法是什么?您可以使用retain station。但这一准则有其局限性。要清空日记账,将设置第一个找到的issn。日记账组必须有一个或多个issn
proc sort data=JRNLFILE;
by journal descending issn;
run;
data want;
set JRNLFILE;
retain t_issn;
by journal descending issn;
if first.journal then
do;
if issn="" then do;
put "ERROR: there is no issn val for group";
stop;
end;
else t_issn =issn;
end;
if issn="" then
do;
issn=t_issn;
end;
run;
比如说。如果使用此表:
+-------+------------------------------+-----------+
| id | journal | issn |
+-------+------------------------------+-----------+
| 94365 | ACTA PHARMACOLOGICA SINICA | |
| 10334 | ACTA PHARMACOLOGICA SINICA | 1671-4083 |
| 1 | ACTA PHARMACOLOGICA SINICA | A_TEST |
| 2 | ACTA PHARMACOLOGICA SINICA | WAS |
| 3 | ACTA PHARMACOLOGICA SINICA | SATRTED |
+-------+------------------------------+-----------+
+-------+------------------------------+-----------+
| id | journal | issn |
+-------+------------------------------+-----------+
| 56201 | ACTA HAEMATOLOGICA | 0001-5792 |
| 94365 | ACTA PHARMACOLOGICA SINICA | |
+-------+------------------------------+-----------+
您将获得:
+-------+----------------------------+-----------+--------+
| id | journal | issn | t_issn |
+-------+----------------------------+-----------+--------+
| 2 | ACTA PHARMACOLOGICA SINICA | WAS | WAS |
| 3 | ACTA PHARMACOLOGICA SINICA | SATRTED | WAS |
| 1 | ACTA PHARMACOLOGICA SINICA | A_TEST | WAS |
| 10334 | ACTA PHARMACOLOGICA SINICA | 1671-4083 | WAS |
| 94365 | ACTA PHARMACOLOGICA SINICA | WAS | WAS |
+-------+----------------------------+-----------+--------+
错误示例。
如果使用此表:
+-------+------------------------------+-----------+
| id | journal | issn |
+-------+------------------------------+-----------+
| 94365 | ACTA PHARMACOLOGICA SINICA | |
| 10334 | ACTA PHARMACOLOGICA SINICA | 1671-4083 |
| 1 | ACTA PHARMACOLOGICA SINICA | A_TEST |
| 2 | ACTA PHARMACOLOGICA SINICA | WAS |
| 3 | ACTA PHARMACOLOGICA SINICA | SATRTED |
+-------+------------------------------+-----------+
+-------+------------------------------+-----------+
| id | journal | issn |
+-------+------------------------------+-----------+
| 56201 | ACTA HAEMATOLOGICA | 0001-5792 |
| 94365 | ACTA PHARMACOLOGICA SINICA | |
+-------+------------------------------+-----------+
您将得到一个错误:
错误:组没有issn val
*t_issn离开去理解函数:)如果数据是按日志排序的,并且有效值首先出现,那么简单的更新可能会起作用。但要注意是否有其他变量缺少值
data want;
update have(obs=0) have ;
by journal;
output;
run;
您可以尝试将数据与ISSN的非缺失值合并。这只需要按日记账对数据进行排序。如果只存在一个唯一的非缺失值,这将非常有效。如果有多个非缺失值,那么结果就不太好
data want ;
merge have have(where=(not missing(issn)) keep=journal issn rename=(issn=_2));
by journal;
if missing(issn) then issn=_2;
drop _2;
run;
您能否按日志对数据集进行排序,使您希望用于ISSN的值位于该日志的第一个观察值上<代码>按日记账递减issn代码>