删除其观察值均不包含SAS中特定值的组
我想删除其观察值中没有NUM=14的整个组 所以像这样的事情: 原始数据删除其观察值均不包含SAS中特定值的组,sas,Sas,我想删除其观察值中没有NUM=14的整个组 所以像这样的事情: 原始数据 ID NUM 1 14 1 12 1 10 2 13 2 11 2 10 3 14 3 10 因为ID=2中没有一个包含NUM=14,所以我删除了组2。 它应该是这样的: ID NUM 1 14 1 12 1 10 3 14 3 10 到目前为止,这就是我所拥有的,但它似乎不起作用 data originaldat; set newdat; by ID; If first.ID the
ID NUM
1 14
1 12
1 10
2 13
2 11
2 10
3 14
3 10
因为ID=2中没有一个包含NUM=14,所以我删除了组2。
它应该是这样的:
ID NUM
1 14
1 12
1 10
3 14
3 10
到目前为止,这就是我所拥有的,但它似乎不起作用
data originaldat;
set newdat;
by ID;
If first.ID then do;
IF NUM EQ 14 then Score = 100;
Else Score = 10;
end;
else SCORE+1;
run;
data newdat;
set newdat;
If score LT 50 then delete;
run;
你有点陷入道琼斯指数的循环,但做得不太对。问题是(假设数据/集合名称输入错误,并且在程序中没有实际错误),第一个数据步骤并没有将100追加到每一行,而是只追加到14行。您需要的是每个ID值有一个保留/不保留决定的“行” 您可以通过执行第一个数据步骤来实现这一点,但保留分数,并且每个ID只输出一行。如果您只是修复了数据/集合的输入错误,那么您的代码实际上可以工作,因为14是第一行;但它只有在14是第一行时才起作用
data originaldat;
input ID NUM ;
datalines;
1 14
1 12
1 10
2 13
2 11
2 10
3 14
3 10
;;;;
run;
data has_fourteen;
set originaldat;
by ID;
retain keep;
If first.ID then keep=0;
if num=14 then keep=1;
if last.id then output;
run;
data newdata;
merge originaldat has_fourteen;
by id;
if keep=1;
run;
这是通过将每个ID 1的值合并到整个数据集来实现的
道琼斯指数双倍上涨也有效
data newdata;
keep=0;
do _n_=1 by 1 until (last.id);
set originaldat;
by id;
if num=14 then keep=1;
end;
do _n_=1 by 1 until (last.id);
set originaldat;
by id;
if keep=1 then output;
end;
run;
这是因为它在数据集上迭代两次;对于每个ID,它在所有记录中迭代一次,查找14,如果找到一个,则将keep设置为1。然后它再次读取该ID的所有记录,并在
keep=1
时保留。然后按ID继续下一组记录。使用proc-sql
的方法是:
proc sql;
create table newdat as
select *
from originaldat
where ID in (
select ID
from originaldat
where NUM = 14
);
quit;
/* Get all the groups that contain an observation where N = 14 */
data keepGroups;
set originaldat;
if NUM = 14;
keep ID;
run;
/* Sort both data sets to ensure the data step merge works as expected */
proc sort data = originaldat;
by ID;
run;
/* Make sure there are no duplicates values in the groups to be kept */
proc sort data = keepGroups nodupkey;
by ID;
run;
/*
Merge the original data with the groups to keep and only keep records
where an observation exists in the groups to keep dataset
*/
data newdat;
merge
originaldat
keepGroups (in = k);
by ID;
if k;
run;
为包含观察值的组选择ID
s,其中NUM=14
。然后,where
子句将所选数据仅限于这些组
等效的数据步方法为:
proc sql;
create table newdat as
select *
from originaldat
where ID in (
select ID
from originaldat
where NUM = 14
);
quit;
/* Get all the groups that contain an observation where N = 14 */
data keepGroups;
set originaldat;
if NUM = 14;
keep ID;
run;
/* Sort both data sets to ensure the data step merge works as expected */
proc sort data = originaldat;
by ID;
run;
/* Make sure there are no duplicates values in the groups to be kept */
proc sort data = keepGroups nodupkey;
by ID;
run;
/*
Merge the original data with the groups to keep and only keep records
where an observation exists in the groups to keep dataset
*/
data newdat;
merge
originaldat
keepGroups (in = k);
by ID;
if k;
run;
在这两个数据集中,语句仅用于在满足条件时输出观测值。在第二种情况下,
k
是一个临时变量,当从keepGroups
an0
(false)读取值时,该变量的值为1
(true),否则为。首先,谢谢。我有几个问题。1.道指代表什么?2.在“if keep=1则输出;”中,为什么“if num=14则输出”不起作用。DoW循环是使用do
循环而不是隐式数据步骤set
循环访问观测值的一种方法。它是一种通用工具,可以减少某些处理必须读取数据的次数。2.第二个set语句正在读取num
的新值,而keep
对于第二个循环的每次迭代都保持不变if keep=1
确保输出整个组,而if num=14
只输出num
为14的观察值。DoW代表Do Whitlock,指的是Ian Whitlock,SAS Luminari,他和Pail Dorfman一起开创了这项技术。如果我想保留同时有14个变量的组,另一个变量等于0。我能加上“Num=14和var=0”吗?我想保留一个包含Num=14和var=0的观察值的组。
data in;
input id num;
cards;
1 14
1 12
1 10
2 16
2 13
3 14
3 67
;
/* To find out the list of groups which contains num=14, use below SQL */
proc sql;
select distinct id into :lst separated by ','
from in
where num = 14;
quit;
/* If you want to create a new data set with only groups containing num=14 then use following data step */
data out;
set in;
where id in (&lst.);
run;