在SAS中对列进行计数

在SAS中对列进行计数,sas,Sas,我在SAS中有一个数据集,其中个人作为行,每个周期的变量作为列。它看起来像这样: data have; input individual t1 t2 t3; cards; 1 112 111 123 2 112 111 123 3 111 111 123 4 112 112 111 ; run; 我想让SAS计算每个时间段每个数字的数量。所以我想得到类似的东西: data want; input count t1 t2 t3; cards; 111 1 3 1 112 3 1 0 123 0

我在SAS中有一个数据集,其中个人作为行,每个周期的变量作为列。它看起来像这样:

data have;
input individual t1 t2 t3;
cards;
1 112 111 123
2 112 111 123
3 111 111 123
4 112 112 111
;
run;
我想让SAS计算每个时间段每个数字的数量。所以我想得到类似的东西:

data want;
input count t1 t2 t3;
cards;
111 1 3 1
112 3 1 0
123 0 0 3
;
run;
我可以用proc freq来实现这一点,但是当我有很多列的时候,输出它的效果并不好


谢谢

如果您的计算机内存足够大,可以容纳整个输出,那么哈希可能是一个可行的解决方案:

data have;
    input individual t1 t2 t3;
    cards;
1 112 111 123
2 112 111 123
3 111 111 123
4 112 112 111
;
run;

data _null_;
    if _n_=1 then
        do;
            /*This is to construct a Hash, where count is tracked and t1-t3 is maintained*/
            declare hash h(ordered:'a');
            h.definekey('count');
            h.definedata('count', 't1','t2','t3');
            h.definedone();
            call missing(count, t1,t2,t3);
        end;

    set have(rename=(t1-t3=_t1-_t3)) 
        /*rename to avoid conflict between input data and Hash object*/
    end=last;

    array _t(*) _t:;
    array t(*) t:;

    /*The key is to set up two arrays, one is for input data, 
    another is for Hash feed,  and maneuver their index variable accordingly*/
    do i=1 to dim(_t);
        count=_t(i);
        rc=h.find(); /*search the Hash and bring back data elements if found*/

        /*If there is a match, then corresponding 't' will increase by '1'*/
        if rc=0 then
            t(i)+1;
        else
            do;
                /*If there is no match, then corresponding 't' will be initialized as '1', 
                                    and all of the other 't' reset to '0'*/
                do j=1 to dim(t);
                    t(j)=0;
                end;

                t(i)=1;
            end;

        rc=h.replace(); /*Update the Hash*/
    end;

    if last then
        rc=h.output(dataset:'want');
run;

首先,重新构造数据,使其更为垂直。这将更容易处理。我们还希望创建一个标志,稍后将其用作计数器

data have2;
  set have;
  array arr[*] t1-t3;

  flag = 1;
  do period=lbound(arr) to hbound(arr);    
    val = arr[period];
    output;
  end;

  keep period val flag;
run;
对数据进行汇总,以便我们知道该值在每个期间出现的次数

proc sql noprint;
  create table smry as
  select val,
         period,
         sum(flag) as count
  from have3
  group by 1,2
  order by 1,2
  ;
quit;
data have;
   input individual t1 t2 t3;
   cards;
1 112 111 123
2 112 111 123
3 111 111 123
4 112 112 111
;;;;
run;
proc print;
   run;
proc summary data=have chartype;
   class t:;
   ways 1;
   output out=want;
   run;
proc print;
   run;
data want;
   set want;
   p = findc(_type_,'1');
   c = coalesce(of t1-t3);
   run;
proc print;
   run;
proc summary data=want nway completetypes;
   class c p;
   freq _freq_;
   output out=final;
   run;
proc print;
   run;
proc transpose data=final out=morefinal(drop=_name_) prefix=t;
   by c;
   id p;
   var _freq_;
   run;
proc print;
   run;
对数据进行转置,使每个值有一行,然后是之后每个周期的计数:

proc transpose data=smry out=want(drop=_name_);
  by val;
  id period;
  var count;
run;
请注意,当您在第一步中定义数组时,可以使用这种表示法,它允许动态的周期数:

  array arr[*] t:;
这假设数据集中以“t”开头的每个变量都应该进入数组。

尝试以下操作:

%macro freq(dsn);
proc sql;
select name into:name separated by ' ' from dictionary.columns where libname='WORK' and memname='HAVE' and name like 't%';
quit;
%let ncol=%sysfunc(countw(&name,%str( )));
%do i=1 %to &ncol;
%let col=%scan(&name,&i);
proc freq data=have;
table &col/out=col_&i(keep=&col count rename=(&col=count count=&col));
run;
%end;
data temp;
    merge
    %do i=1 %to &ncol;
      col_&i
    %end;
 ;
by count;
run;
data want;
   set temp;
   array vars t:;
   do over vars;
     if missing(vars) then vars=0;
   end;
run;
%mend;

%freq(have)

一般来说,在元数据中包含数据是一个坏主意,因为在这里,周期被编码到Tn变量中,您真的希望它成为一个组。说了这句话,你还是可以吃蛋糕的

PROC SUMMARY可以快速获得每个Tn的计数,然后您就可以处理更小的数据集。这里有一种方法应该在很多时间段都能很好地工作

proc sql noprint;
  create table smry as
  select val,
         period,
         sum(flag) as count
  from have3
  group by 1,2
  order by 1,2
  ;
quit;
data have;
   input individual t1 t2 t3;
   cards;
1 112 111 123
2 112 111 123
3 111 111 123
4 112 112 111
;;;;
run;
proc print;
   run;
proc summary data=have chartype;
   class t:;
   ways 1;
   output out=want;
   run;
proc print;
   run;
data want;
   set want;
   p = findc(_type_,'1');
   c = coalesce(of t1-t3);
   run;
proc print;
   run;
proc summary data=want nway completetypes;
   class c p;
   freq _freq_;
   output out=final;
   run;
proc print;
   run;
proc transpose data=final out=morefinal(drop=_name_) prefix=t;
   by c;
   id p;
   var _freq_;
   run;
proc print;
   run;

\u TYPE_uuu和\u FREQ_u_uu正被SO转换成斜体指示器。添加\在前面。“一般来说,在元数据中包含数据是个坏主意”。为什么?因为这使得它更难使用,在这个例子中,时间段和值(111 112)都占据相同的空间。如果时间段是一个变量,那么它可以直接在类中使用。我在第二个过程总结中做了这件事。