如何在SAS Enterprise Guide(SAS企业指南)中从数据集中选择行,并将特定条件应用于每个子集

如何在SAS Enterprise Guide(SAS企业指南)中从数据集中选择行,并将特定条件应用于每个子集,sas,proc-sql,datastep,Sas,Proc Sql,Datastep,我有下表: COMPANY_NAME | GROUP | COUNTRY | STATUS COM1 | 1 | DE | DELETED COM2 | 1 | DE | REMAINING COM3 | 1 | UK | DELETED COM4 | 2 | ES | DELETED COM5 | 2 | F

我有下表:

COMPANY_NAME | GROUP | COUNTRY | STATUS  
COM1         |   1   |    DE   | DELETED   
COM2         |   1   |    DE   | REMAINING  
COM3         |   1   |    UK   | DELETED  
COM4         |   2   |    ES   | DELETED  
COM5         |   2   |    FR   | DELETED  
COM6         |   3   |    RO   | DELETED  
COM7         |   3   |    BG   | DELETED  
COM8         |   3   |    ES   | REMAINING  
COM9         |   3   |    ES   | DELETED 
我需要得到:

COMPANY_NAME | GROUP | COUNTRY | STATUS  
COM3         |   1   |    UK   | DELETED  
COM4         |   2   |    ES   | DELETED  
COM5         |   2   |    FR   | DELETED  
COM6         |   3   |    RO   | DELETED  
COM7         |   3   |    BG   | DELETED

因此,我需要状态已删除的所有条目,并且在每个组中,没有任何公司名称的状态与已删除状态保持在同一国家/地区。我可以使用PROC-SQL或数据步骤

到目前为止,我尝试的是:

PROC SQL;
CREATE TABLE WORK.OUTPUT AS
SELECT *
FROM WORK.INPUT
WHERE STATUS = 'DELETED' AND COUNTRY NOT IN (SELECT COUNTRY FROM WORK.INPUT WHERE STATUS = 'REMAINING');
QUIT;
但这显然将所有其他国家排除在所有集团之外

我还尝试了一个数据步骤:

DATA WORK.OUTPUT;
SET WORK.INPUT;
BY GROUP COUNTRY;

IF NOT (STATUS = 'DELETED' AND COUNTRY NOT IN (COUNTRY WHERE STATUS = 'REMAINING')) THEN DELETE; 

RUN;
但是语法不正确,因为我不知道正确的书写方法。

试试这个:

proc sql;
select * from your_table
where status = 'deleted' and 
      catx("_",country,group) not in 
         (select catx("_",country,group) from your_table where status='remaining');
quit;  
输出:

company_name | group | country | status
com3         |   1   |    UK   | deleted
com4         |   2   |    ES   | deleted
com5         |   2   |    FR   | deleted
com6         |   3   |    RO   | deleted
com7         |   3   |    BG   | deleted
试试这个:

proc sql;
select * from your_table
where status = 'deleted' and 
      catx("_",country,group) not in 
         (select catx("_",country,group) from your_table where status='remaining');
quit;  
输出:

company_name | group | country | status
com3         |   1   |    UK   | deleted
com4         |   2   |    ES   | deleted
com5         |   2   |    FR   | deleted
com6         |   3   |    RO   | deleted
com7         |   3   |    BG   | deleted

你的解决方案表明你的思路是正确的

一个数据步骤解决方案是:

data want(drop = remain_list);
   length remain_list $ 20;

   do until(last.group);
      set have;
      by group;

      if status = 'REMAINING' and not find(remain_list, country) then
         remain_list = catx(' ', remain_list, country);
   end;

   do until(last.group);
      set have;
      by group;

      if status = 'DELETED' and not find(remain_list, strip(country)) then
         output;
   end;
run;

你的解决方案表明你的思路是正确的

一个数据步骤解决方案是:

data want(drop = remain_list);
   length remain_list $ 20;

   do until(last.group);
      set have;
      by group;

      if status = 'REMAINING' and not find(remain_list, country) then
         remain_list = catx(' ', remain_list, country);
   end;

   do until(last.group);
      set have;
      by group;

      if status = 'DELETED' and not find(remain_list, strip(country)) then
         output;
   end;
run;

目标中的第一条记录是否应该是
COM3
,而不是
COM1
?请点击DomPazz。我改了。谢谢。“我可以使用PROC SQL或数据步骤”您尝试了哪一个?请共享代码和所有错误消息。^若要添加,StackOverflow不是一种代码编写服务。如果您不知道从何处开始,请将状态代码更改为可以排序的数字代码,以便最小值或最大值以及只有一个是您想要的。然后就很容易解决了。如果目标中的第一条记录是
COM3
,而不是
COM1
,请点击DomPazz。我改了。谢谢。“我可以使用PROC SQL或数据步骤”您尝试了哪一个?请共享代码和所有错误消息。^若要添加,StackOverflow不是一种代码编写服务。如果您不知道从何处开始,请将状态代码更改为可以排序的数字代码,以便最小值或最大值以及只有一个是您想要的。然后就很容易解决了。@reeza为什么我们需要将状态代码转换为数字?很乐意帮助:)@reeza为什么我们需要将状态代码转换为数字?很乐意帮助:)谢谢。我选择了procsql解决方案,因为它的代码更少。但是很高兴看到数据步骤的执行。谢谢。我选择了procsql解决方案,因为它的代码更少。但是看到数据步骤的执行也很棒。