Sas Proc tablate,在我的专栏中,我有50多个变量,如何才能将其减少到只有5个?

Sas Proc tablate,在我的专栏中,我有50多个变量,如何才能将其减少到只有5个?,sas,Sas,运行这个之后,我收到了这些结果,现在我想将列限制为仅5个变量 使用“where”语句限制列表中的col1值 可以基于值属性(如以字母a开头)进行限制 您可以基于值列表进行限制: where col1 =: 'A'; 样本数据: where col1 in ('Apples', 'Lentils', 'Oranges', 'Sardines', 'Cucumber'); 频率表,也显示所有计数 data have; call streaminit(123); array col1s[

运行这个之后,我收到了这些结果,现在我想将列限制为仅5个变量


使用“where”语句限制列表中的
col1

可以基于值属性(如以字母a开头)进行限制

您可以基于值列表进行限制:

where col1 =: 'A';
样本数据:

where col1 in ('Apples', 'Lentils', 'Oranges', 'Sardines', 'Cucumber');
频率表,也显示所有计数

data have;
  call streaminit(123);

  array col1s[50] $20 _temporary_ (
  "Apples" "Avocados" "Bananas" "Blueberries" "Oranges" "Strawberries" "Eggs" "Lean beef" "Chicken breasts" "Lamb" "Almonds" "Chia seeds" "Coconuts" "Macadamia nuts" "Walnuts" "Asparagus" "Bell peppers" "Broccoli" "Carrots" "Cauliflower" "Cucumber" "Garlic" "Kale" "Onions" "Tomatoes" "Salmon" "Sardines" "Shellfish" "Shrimp" "Trout" "Tuna" "Brown rice" "Oats" "Quinoa" "Ezekiel bread" "Green beans" "Kidney beans" "Lentils" "Peanuts" "Cheese" "Whole milk" "Yogurt" "Butter" "Coconut oil" "Olive oil" "Potatoes" "Sweet potatoes" "Vinegar" "Dark chocolate"
  );

  do row1 = 1 to 20;
    do _n_ = 1 to 1000;
      col1 = col1s[ceil(rand('uniform',50))];
      x = ceil(rand('uniform',250));
      output;
    end;
  end;
run;
只有前5名的col1列表需要一个步骤来确定哪些col1符合该标准。这些列的列表可以用作where
in
子句的一部分

* col1 values shown in order by value;
proc tabulate data=have;
  class     row1 col1;
  table ALL row1,col1;
run;

* col1 values shown in order by ALL frequency;
proc tabulate data=have;
  class     row1;
  class     col1 / order=freq;
  table ALL row1,col1;
run;

* Letter T col1 values shown in order by ALL frequency;
proc tabulate data=have;
  where col1 =: 'T';
  class     row1;
  class     col1 / order=freq;
  table ALL row1,col1;
run;
按值顺序排列的列数

按频率顺序排列的列数

T Col1s

前5列

嘿,在where语句中,您实际上提供了列的名称,是否可以将其动态化?我们如何找到前5列。如何定义与前5个概念对应的
col1
值?在答案样本代码中,统计数据为
N
,前5名是否应该是所有行中“N”值最高的
col1
?使用
where
限制处理的数据将涉及制表之前的一个步骤,其中那些
col1
是确定的。我的col1变量本质上是字符串,例如apple、jam、pine和类似的row1变量,我创建了一个交叉表来计算col1变量的每个row1变量的计数。既然我已经收到了结果,我想把结果限制在col1中的5个变量,而不是50个变量。通过编辑并添加一些示例数据来改进这个问题。我想只有当“apple”在前五名中最常见、频率最高时,才会出现col1='apple',但数据确实会澄清你的问题。嗨,richard,我缺少几行有空白值,我如何确保即使是那些在所有5列中都有空白值的行也会出现。。
* col1 values shown in order by value;
proc tabulate data=have;
  class     row1 col1;
  table ALL row1,col1;
run;

* col1 values shown in order by ALL frequency;
proc tabulate data=have;
  class     row1;
  class     col1 / order=freq;
  table ALL row1,col1;
run;

* Letter T col1 values shown in order by ALL frequency;
proc tabulate data=have;
  where col1 =: 'T';
  class     row1;
  class     col1 / order=freq;
  table ALL row1,col1;
run;
* determine the 5 col1s with highest frequency count;
proc sql noprint outobs=5;
  select 
    quote(col1) into :top5_col1_list separated by ' '
  from 
    ( select col1, count(*) as N from  have
      group by col1 
    )
  order by N descending;
quit;

proc tabulate data=have;
  where col1 in (&top5_col1_list);
  class     row1;
  class     col1 / order=freq;
  table ALL row1,col1;
run;