If statement 在保留组的同时创建子组_If Statement_Sas_Grouping

If statement 在保留组的同时创建子组

if-statement sas

If statement 在保留组的同时创建子组,if-statement,sas,grouping,If Statement,Sas,Grouping,我有一个SAS数据集，其中每一行都包含有关公司的信息。公司的行业存在一个变量；门派最多有700个部门，因此我需要对它们进行分组，如：工业：矿业、纺织业、食品业。。。等等我使用以下代码执行此操作： data temp; set in.data; if sect in ('01' '02' '03' '04' '05' '06') then my_sector='Industry' ; run; 棘手的部分是；我还想给出每个部门的详细情况。例如，我希望保留整个“行业”部门，但我也希望在

我有一个SAS数据集，其中每一行都包含有关公司的信息。公司的行业存在一个变量；门派最多有700个部门，因此我需要对它们进行分组，如：

工业：矿业、纺织业、食品业。。。等等我使用以下代码执行此操作：

data temp; set in.data;

if sect in ('01' '02' '03' '04' '05' '06')    then my_sector='Industry' ;

run;

棘手的部分是；我还想给出每个部门的详细情况。例如，我希望保留整个“行业”部门，但我也希望在“行业”中有子组。但当我运行以下代码时：

data temp; set in.data;

if sect in ('01' '02' '03' '04' '05' '06')    then my_sector='Industry' ;

if sect in ('01' '02')    then my_sector='Textile Industry' ;
if sect in ('03' '04')    then my_sector='Food Industry' ;
if sect in ('05')         then my_sector='Mining Industry' ;

run;

子组工作正常，但全球“行业”部门只包含其他子组中不包含的内容（此处为部门06）。问题是这样的；我如何既有3个分组，又有一个“行业”部门，该部门包括所有3个分组和行业的其他部分（从01年到06年）

非常感谢，

首先执行

if语句后，您将覆盖变量的值。您应该使用其他变量或输出
语句：
data want; set have;
length my_sector $20;

if sect in ('01' '02' '03' '04' '05' '06') then do;
   my_sector='Industry' ;
   output;
   if sect in ('01' '02') then do;
      my_sector='Textile Industry' ;
      output;
   end;
   if sect in ('03' '04') then do;
      my_sector='Food Industry' ;
      output;
   end;
   if sect in ('05') then do;
      my_sector='Mining Industry' ;
      output;
   end;
end;
run;

拥有数据集：
+------+
| sect |
+------+
|   01 |
|   02 |
|   03 |
|   04 |
|   05 |
|   06 |
+------+

需要数据集：
+------+------------------+
| sect |    my_sector     |
+------+------------------+
|   01 | Industry         |
|   01 | Textile Industry |
|   02 | Industry         |
|   02 | Textile Industry |
|   03 | Industry         |
|   03 | Food Industry    |
|   04 | Industry         |
|   04 | Food Industry    |
|   05 | Industry         |
|   05 | Mining Industry  |
|   06 | Industry         |
+------+------------------+

就
我还想建立一个小组
您需要将“sect”映射到两个变量的层次结构中，例如父级“industry”和子级“sector”
一般来说，最好将映射的控制设置到控制表中，以便在新的或附加的“sect”值发挥作用时轻松处理行业和部门名称
data sect_mappings;  length sect $2 group $8 sector $8; input
sect   group     sector; datalines;
01     Industry  Textile
02     Industry  Textile
03     Industry  Food
04     Industry  Food
05     Industry  Mining
run;

控制表可以与原始数据保持连接，以便将“sect”值映射到“group”和“sector”名称（即sect
的格式化值）
SAS中最强大的功能之一是自定义格式的概念，因此您实际上可以不使用sect
值，而是根据格式化的值自动处理它
可以直接从映射表创建自定义格式。在本例中，您需要两种自定义格式，一种用于将sect
映射到group
，另一种用于将sect
映射到sector

data sector_cntlin;
  set sect_mappings;
  rename sect=start;

  fmtname = '$sect_group'; label=group;  output;
  fmtname = '$sect_name' ; label=sector; output;
run;

proc sort data=sector_cntlin;
  by fmtname;
run;

proc format cntlin=sector_cntlin;
run;

这些格式可用于将sect
作为by
或class
变量来聚合信息的SAS过程
假设您有示例数据：
data have;
  call streaminit(123);
  do company_id = 1 to 100;
    sect = put(ceil(rand('uniform', 10)), z2.);
    output;
  end;
run;

为了有一个两层的层次结构（使用格式化的值），您需要一个额外的变量来重复sect
值
data want;
  set have;
  sect_repeat = sect;
run;

此时，如果sect
和sect\u repeat
变量是class
变量，则SAS程序将以两级层次结构处理数据
proc tabulate data=want;
  class sect sect_repeat;

  format sect $sect_group.;
  format sect_repeat $sect_name.;

  label sect = 'Group';
  label sect_repeat = 'Sector';

  table   
    sect * (all sect_repeat)
  , n
  / nocellmerge
  ;
run;

输出如下所示的报告。sect
的未映射值很明显，可以添加到映射表中，或者使用where
语句过滤掉
太好了！但这是否也适用于3个或更多类别？比如说，我想保持这种精确的形状，但也需要放大食品行业；有了这19家公司，但在这19家公司中，有9家在谷物行业，有10家在蔬菜行业？谢谢