删除其观察值均不包含SAS中特定值的组

删除其观察值均不包含SAS中特定值的组,sas,Sas,我想删除其观察值中没有NUM=14的整个组 所以像这样的事情: 原始数据 ID NUM 1 14 1 12 1 10 2 13 2 11 2 10 3 14 3 10 因为ID=2中没有一个包含NUM=14,所以我删除了组2。 它应该是这样的: ID NUM 1 14 1 12 1 10 3 14 3 10 到目前为止,这就是我所拥有的,但它似乎不起作用 data originaldat; set newdat; by ID; If first.ID the

我想删除其观察值中没有NUM=14的整个组

所以像这样的事情: 原始数据

ID  NUM 
1  14
1  12
1  10
2  13
2  11
2  10
3  14
3  10
因为ID=2中没有一个包含NUM=14,所以我删除了组2。 它应该是这样的:

ID  NUM 
1  14
1  12
1  10
3  14
3  10
到目前为止,这就是我所拥有的,但它似乎不起作用

data originaldat;
set newdat;
by ID;
If first.ID then do;
        IF NUM EQ 14 then Score = 100;
        Else Score = 10;
    end;
else SCORE+1;
run; 

data newdat;
set newdat;
   If score LT 50 then delete;
run;

你有点陷入道琼斯指数的循环,但做得不太对。问题是(假设数据/集合名称输入错误,并且在程序中没有实际错误),第一个数据步骤并没有将100追加到每一行,而是只追加到14行。您需要的是每个ID值有一个保留/不保留决定的“行”

您可以通过执行第一个数据步骤来实现这一点,但保留分数,并且每个ID只输出一行。如果您只是修复了数据/集合的输入错误,那么您的代码实际上可以工作,因为14是第一行;但它只有在14是第一行时才起作用

data originaldat;   
input ID  NUM ;
datalines;
1  14
1  12
1  10
2  13
2  11
2  10
3  14
3  10
;;;;
run;

data has_fourteen;
set originaldat;
by ID;
retain keep;
If first.ID then keep=0;
if num=14 then keep=1;
if last.id then output;
run; 

data newdata;
  merge originaldat has_fourteen;
  by id;
  if keep=1;
run;
这是通过将每个ID 1的值合并到整个数据集来实现的

道琼斯指数双倍上涨也有效

data newdata;
  keep=0;
  do _n_=1 by 1 until (last.id);
    set originaldat;
    by id;
    if num=14 then keep=1;
  end;
  do _n_=1 by 1 until (last.id);
    set originaldat;
    by id;
    if keep=1 then output;
  end;
run;

这是因为它在数据集上迭代两次;对于每个ID,它在所有记录中迭代一次,查找14,如果找到一个,则将keep设置为1。然后它再次读取该ID的所有记录,并在
keep=1
时保留。然后按ID继续下一组记录。

使用
proc-sql
的方法是:

proc sql;
    create table newdat as
    select * 
    from originaldat
    where ID in (
        select ID 
        from originaldat
        where NUM = 14
    );
quit;
/* Get all the groups that contain an observation where N = 14 */
data keepGroups;
    set originaldat;
    if NUM = 14;
    keep ID;
run;
/* Sort both data sets to ensure the data step merge works as expected */
proc sort data = originaldat;
    by ID;
run;
/* Make sure there are no duplicates values in the groups to be kept */
proc sort data = keepGroups nodupkey;
    by ID;
run;
/* 
    Merge the original data with the groups to keep and only keep records
    where an observation exists in the groups to keep dataset
*/
data newdat;
    merge 
        originaldat 
        keepGroups (in = k);
    by ID;
    if k;
run;
为包含观察值的组选择
ID
s,其中
NUM=14
。然后,
where
子句将所选数据仅限于这些组


等效的数据步方法为:

proc sql;
    create table newdat as
    select * 
    from originaldat
    where ID in (
        select ID 
        from originaldat
        where NUM = 14
    );
quit;
/* Get all the groups that contain an observation where N = 14 */
data keepGroups;
    set originaldat;
    if NUM = 14;
    keep ID;
run;
/* Sort both data sets to ensure the data step merge works as expected */
proc sort data = originaldat;
    by ID;
run;
/* Make sure there are no duplicates values in the groups to be kept */
proc sort data = keepGroups nodupkey;
    by ID;
run;
/* 
    Merge the original data with the groups to keep and only keep records
    where an observation exists in the groups to keep dataset
*/
data newdat;
    merge 
        originaldat 
        keepGroups (in = k);
    by ID;
    if k;
run;

在这两个数据集中,语句仅用于在满足条件时输出观测值。在第二种情况下,
k
是一个临时变量,当从
keepGroups
an
0
(false)读取值时,该变量的值为
1
(true),否则为。

首先,谢谢。我有几个问题。1.道指代表什么?2.在“if keep=1则输出;”中,为什么“if num=14则输出”不起作用。DoW循环是使用
do
循环而不是隐式数据步骤
set
循环访问观测值的一种方法。它是一种通用工具,可以减少某些处理必须读取数据的次数。2.第二个set语句正在读取
num
的新值,而
keep
对于第二个循环的每次迭代都保持不变
if keep=1
确保输出整个组,而
if num=14
只输出
num
为14的观察值。DoW代表Do Whitlock,指的是Ian Whitlock,SAS Luminari,他和Pail Dorfman一起开创了这项技术。如果我想保留同时有14个变量的组,另一个变量等于0。我能加上“Num=14和var=0”吗?我想保留一个包含Num=14和var=0的观察值的组。
data in;
input id num;
cards;
1 14
1 12
1 10
2 16
2 13
3 14
3 67
;

/* To find out the list of groups which contains num=14, use below SQL */

proc sql;
  select distinct id into :lst separated by ','
  from in
  where num = 14;
quit;

/* If you want to create a new data set with only groups containing num=14 then use following data step */

data out;
 set in;
 where id in (&lst.);
run;