Sql 使用Proc转置或更简单的过程进行数据转换
我有一个数据集:Sql 使用Proc转置或更简单的过程进行数据转换,sql,sas,Sql,Sas,我有一个数据集: Period Store Item feature_1 feature_2 JAN A a1 3 4 JAN A a2 4 9 JAN A a3 2 1 JAN A a4 4 9
Period Store Item feature_1 feature_2
JAN A a1 3 4
JAN A a2 4 9
JAN A a3 2 1
JAN A a4 4 9
FEB A a2 4 9
JAN B a2 3 1
FEB B b2 4 9
.....
我想获取数据集:
Period Store a1_feature_1 a1_feature_2 a2_feature_1 a2_feature_2....
JAN A 3 4 4 9
FEB A . . 4 9
JAN B . . 3 1
其中,最终数据集的每个观测值包含每个时段内的每个出口,同时在同一观测值中包含每个项目的所有特征
我最初的猜测是尝试使用第一个宏来创建变量a1_feature_1、a1_feature_2、a2_feature_1、a2_feature_2
然后使用proc sql group by跨存储和期间折叠
我想知道是否可以使用proc transpose、sql或其他更简单的步骤来转换此数据?以下是一种方法:
data have;
input (Period Store Item) ($) feature_1 feature_2; cards;
JAN A a1 3 4
JAN A a2 4 9
JAN A a3 2 1
JAN A a4 4 9
FEB A a2 4 9
JAN B a2 3 1
FEB B b2 4 9
;
run;
proc sql noprint;
select distinct cats(item,'_feature1'),cats(item,'_feature2'),
into :item_list1 separated by ' ', :item_list2 separated by ' '
from have;
quit;
data want;
do until(last.period);
set have;
by store period notsorted;
array f1[*] &item_list1;
array f2[*] &item_list2;
do i = 1 to dim(f1);
if vname(f1[i]) eq: trim(item) then do;
f1[i] = feature_1;
f2[i] = feature_2;
end;
end;
end;
drop i feature_1 feature_2;
run;
注意:这并没有给出问题中所示的列顺序,但是如果需要的话,您可以使用一些额外的逻辑轻松地解决这个问题。此外,用于定义数组的宏变量只能容纳几千个项目的变量名。您还可以将所有
功能变量放入一个列表中,使用每个变量转换数据并命名后缀,然后合并在一起。使用此方法,您不必手动键入所有功能变量,因为sql
方法会为您执行此操作:
data test;
length Period Store Item $5 feature_1 feature_2 8;
input Period $ Store $ Item $ feature_1 feature_2;
datalines;
JAN A a1 3 4
JAN A a2 4 9
JAN A a3 2 1
JAN A a4 4 9
FEB A a2 4 9
JAN B a2 3 1
FEB B b2 4 9
;
run;
proc sort data = test;
by PERIOD STORE;
run;
** how many feature_ vars do I have? **;
proc sql noprint;
create table features as
select NAME
from dictionary.columns
where libname="WORK" and memname="TEST" and index(NAME,"feature");
** put them into a list to loop over **;
select NAME
into: feature_list separated by " "
from features;
quit;
%put &feature_list.;
** transpose data using each feature_ variable then merge when finished **;
%MACRO loop_over(feature_list);
%do i=1 %to %sysfunc(countw(&feature_list.));
%let feature=%scan(&feature_list.,&i.);
proc transpose data = test out=trans_&feature.(drop=_NAME_) SUFFIX=_&feature.;
by PERIOD STORE;
id ITEM;
var &feature.;
run;
%end;
data merged;
merge trans_:;
by PERIOD STORE;
run;
%MEND;
%loop_over(&feature_list.);
一种方法是分别转置这两个特性,然后使用item作为id,feature_1 resp feature_2作为后缀合并生成的表。这是可行的,但与数组方法相比,对于更大的数据集,它的性能会非常差,因为proc transpose调用的数量太多,每个调用都必须读取整个数据集两次。