Sas 确定每个id内的常量变量（堆叠数据集）_Sas

Sas 确定每个id内的常量变量（堆叠数据集）

sas

Sas 确定每个id内的常量变量（堆叠数据集）,sas,Sas,我继承了一个记录不良的人月数据集，该数据集没有匹配的人级数据集。我想确定person-month数据集中的哪些变量实际上是person-level变量（对于具有特定id的所有观察值都是常量），例如您期望的出生日期。简单的例子： id month dob race tx weight 1 1 4058 1 1 105 1 2 4058 1 1 107 1 3 4058 1 2 108 2 1 1622 2

我继承了一个记录不良的人月数据集，该数据集没有匹配的人级数据集。我想确定person-month数据集中的哪些变量实际上是person-level变量（对于具有特定id的所有观察值都是常量），例如您期望的出生日期。简单的例子：

id month dob    race tx weight
1  1     4058   1    1  105
1  2     4058   1    1  107
1  3     4058   1    2  108
2  1     1622   2    1  153
2  2     1622   2    3  153
2  3     1622   2    2  153

在本例中，dob和race在个体内是固定的，但tx和体重在个体内按月变化

我想出了一个拙劣的解决方案：使用proc means通过id计算所有数值变量的标准偏差，然后取这些标准偏差的最大值。如果一个变量的std的最大值为0，则该列在任何个体中都没有差异，我可以将该变量标记为固定（或个人级别）

我觉得我错过了一个更简单的统计测试，来确定我的数百个变量中哪些在每个人的观察中是固定的，哪些在个人的观察中是变化的。有什么建议吗

我认为，除了你计算出的标准偏差，甚至最小/最大值（大致相同），没有“简单的统计检验”。我可能只是在procsql中执行，除非有大量的变量；这也允许您使用字符变量

%macro comparetype(var);
max(&var.) = min(&var.) as &var.
%mend comparetype;
proc sql;
select min(origin) as origin, min(type) as type, min(drivetrain) as drivetrain,
            min(msrp) as msrp,min(invoice) as invoice,min(enginesize) as enginesize from (
  select make,
%comparetype(origin),
%comparetype(type),
%comparetype(drivetrain),
%comparetype(msrp),
%comparetype(invoice),
%comparetype(enginesize)
from sashelp.cars
    group by make
);
quit;

我会在PROC FREQ中使用NLEVELS选项。这会为每个变量提供唯一值的数量，因此您要查找唯一值（NLEVELS）为1的变量。这是代码，如果尚未完成排序，则需要提前按id对数据进行排序

data have;
input id month dob race tx weight;
cards;
1  1     4058   1    1  105
1  2     4058   1    1  107
1  3     4058   1    2  108
2  1     1622   2    1  153
2  2     1622   2    3  153
2  3     1622   2    2  153
;
run;

ods select nlevels;
ods output nlevels=want;
ods noresults;
proc freq data=have nlevels;
by id;
run;
ods results;