SAS按每个变量的计数器分组-主键创建
我有一些数据需要分成12个左右不同的组,没有键,数据的顺序很重要 数据有许多组,这些组中有单数组和/或嵌套组。由于数据采用分层格式,每个组都将被拆分。因此,每个“组”都有自己的格式,所有这些格式都需要连接到一行(或多行)上 示例数据文件:SAS按每个变量的计数器分组-主键创建,sas,iterator,hierarchy,flat-file,enumerate,Sas,Iterator,Hierarchy,Flat File,Enumerate,我有一些数据需要分成12个左右不同的组,没有键,数据的顺序很重要 数据有许多组,这些组中有单数组和/或嵌套组。由于数据采用分层格式,每个组都将被拆分。因此,每个“组”都有自己的格式,所有这些格式都需要连接到一行(或多行)上 示例数据文件: "TRANS","23115168","","","OTVST","","23115168","","COMLT","","",20180216,"OAMI","501928",, "MTPNT","UPDTE",2415799999,"","","17","
"TRANS","23115168","","","OTVST","","23115168","","COMLT","","",20180216,"OAMI","501928",,
"MTPNT","UPDTE",2415799999,"","","17","","",,20180216,
"ASSET","","REPRT","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","REMVE","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","INSTL","METER","","CR","E6VG470","LPG",2017,"E6S06769699999","","","LI"
"METER","","U","S1",6.0000,"","",20180216,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00000"
"APPNT","",20180216,,"","123900",""
输入数据时应存在的层次结构。我想以后可能会有几个表可以连接在一起。(用于说明父子级别的数字)
工业气体数据,因此在这个流程中,每个MTPNT可以有许多资产,这些资产可以有许多REGST,因为这是保持仪表读数的地方
我试过先使用分组和迭代。处理,但我以前从未处理过此类数据。我需要一种方法来拆分每个分组创建一个键,当拆分并定义字段时,可以将其重新连接在一起
data TRANS;
set mpancreate_a;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "TRANS" then
TRANSKey+1;
end;
run;
data TRANS;
set TRANS;
TRANSKey2 + 1;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "TRANS" then
TRANSKEY2=1;
end;
run;
data MTPNT;
set TRANS;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "MTPNT" then
MTPNTKEY+1;
end;
run;
data MTPNT;
set MTPNT;
by MTPNTKEY NOTSORTED;
if first.MTPNTKEY and DataItmGrp = "MTPNT" then
MTPNTKEY2=0;
MTPNTKEY2+1;
run;
data ASSET;
set MTPNT;
IF MTPNTKEY = 0 THEN
MTPNTKEY2=0;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "ASSET" then
ASSETKEY+1;
end;
run;
data ASSET;
set ASSET;
by ASSETKEY NOTSORTED;
if first.ASSETKEY and DataItmGrp = "ASSET" then
ASSETKEY2=0;
ASSETKEY2+1;
IF ASSETKEY =0 THEN
ASSETKEY2=0;
run;
我已经尝试过操纵内嵌,以便所有数据在每个TRANS的一行上显示,但是我仍然有应用字段的问题,排序是最重要的
我已经设法为一些小组弄到了一些钥匙,但在分开后,他们并没有完全结合在一起
data TRANS;
set mpancreate_a;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "TRANS" then
TRANSKey+1;
end;
run;
data TRANS;
set TRANS;
TRANSKey2 + 1;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "TRANS" then
TRANSKEY2=1;
end;
run;
data MTPNT;
set TRANS;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "MTPNT" then
MTPNTKEY+1;
end;
run;
data MTPNT;
set MTPNT;
by MTPNTKEY NOTSORTED;
if first.MTPNTKEY and DataItmGrp = "MTPNT" then
MTPNTKEY2=0;
MTPNTKEY2+1;
run;
data ASSET;
set MTPNT;
IF MTPNTKEY = 0 THEN
MTPNTKEY2=0;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "ASSET" then
ASSETKEY+1;
end;
run;
data ASSET;
set ASSET;
by ASSETKEY NOTSORTED;
if first.ASSETKEY and DataItmGrp = "ASSET" then
ASSETKEY2=0;
ASSETKEY2+1;
IF ASSETKEY =0 THEN
ASSETKEY2=0;
run;
我希望找到的每个组都有一个计数器,并为该特定组保留一个计数器-但我无法根据上面的层次结构确定如何进出分组
我希望,一旦我有了这些密钥,我就可以按组分割数据,然后左键连接到一起
_n_ TRANS TRANS2 MTPNT MTPNT2
TRANS 1 1 0 0 0
MTPNT 2 2 1 1 1
ASSET 3 3 1 2 1
METER 4 4 1 3 1
READG 5 5 1 4 1
MTPNT 6 6 1 1 2
ASSET 7 7 1 2 2
METER 8 8 1 3 2
READG 9 9 1 4 2
APPNT 10 10 1 5 2
TRANS 11 1 2 6 2
MTPNT 12 2 2 1 3
ASSET 13 3 2 2 3
METER 14 4 2 3 3
READG 15 5 2 4 3
MTPNT 16 6 2 1 4
ASSET 17 7 2 2 4
METER 18 8 2 3 4
READG 19 9 2 4 4
APPNT 20 10 2 5 4
从没有确定标记的数据文件输入分层数据是有问题的。我的最佳建议是了解您想要提取的显著价值观是什么,以及您想要在什么样的背景下了解它们。对于这个问题,最简单的第一种方法是使用一个带有分类变量的单片表来捕获下降到显著值(抄表)的路径 更复杂的情况是,每行中的第一个令牌驱动该行的输入及其所属的输出表。由于没有关于层次结构绝对或相对位置(如名称和MKPRT)的地标,因此没有100%可靠的方法将其放置在层次结构中,这也会影响从后续数据行读入的项目的放置 根据真实世界的真实复杂性和遵守规则的情况,您可能会或可能不会“遗漏”某些值的读数 假设有一个更简单的目标,就是只获取仪表读数
data want;
length tier level1-level6 $8 path $64 meterReadingString $8 dummy $1;
retain level1-level5 path;
attrib readingdate informat=yymmdd10. format=yymmdd10.;
infile cards dsd missover;
input @1 tier @; * held input - dont advance read line yet;
if tier="TRANS" then do;
level1 = tier;
call missing (of level2-level6);
path = catx("/", of level:);
end;
if tier="MTPNT" and path="TRANS" then do;
level2 = tier;
call missing (of level3-level6);
path = catx("/", of level:);
end;
if tier="ASSET" and path="TRANS/MTPNT" then do;
level3 = tier;
call missing (of level4-level6);
path = catx("/", of level:);
end;
if tier="METER" and path="TRANS/MTPNT/ASSET" then do;
level4 = tier;
call missing (of level5-level6);
path = catx("/", of level:);
end;
if tier="REGST" and path="TRANS/MTPNT/ASSET/METER" then do;
level5 = tier;
call missing (of level6-level6);
path = catx("/", of level:);
end;
if tier="READG" and path="TRANS/MTPNT/ASSET/METER/REGST" then do;
level6 = tier;
path = catx("/", of level:);
input @1 tier readingdate dummy meterReadingString @; * reread line according to tier;
meterReading = input(meterReadingString, best12.);
if path = "TRANS/MTPNT/ASSET/METER/REGST/READG" then OUTPUT;
end;
datalines;
"TRANS","23115168","","","OTVST","","23115168","","COMLT","","",20180216,"OAMI","501928",,
"MTPNT","UPDTE",2415799999,"","","17","","",,20180216,
"ASSET","","REPRT","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","REMVE","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","INSTL","METER","","CR","E6VG470","LPG",2017,"E6S06769699999","","","LI"
"METER","","U","S1",6.0000,"","",20180216,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00000"
"APPNT","",20180216,,"","123900",""
run;
您可以将其用作更复杂的读卡器的基础,该读卡器对于遇到的每一层或层到层的路径具有不同的
输出
数据集。每层需要不同的input
语句,类似于读取READG
的方式 为什么不把上面的数字分配给代码呢。然后,您可以通过获取变量的特定子字符串在任何级别进行聚合?实际数据文件是什么样子的?由于某些层次结构可能出现在不同的层上(如1.1.1.5和1.3点的NAME/ADDRS和NAME/CONTM),因此您可能需要一个具有节点标识或值所有权链接的通用值属性类型的表。需要有规则来知道(在读取数据时)该值属于哪个级别(或层)(如数据文件中的空格缩进或特殊标记字符)上面添加的数据。@Richard您的评论是正确的,考虑到我大部分ETL都是使用stock DI Studio进行的,这就是为什么我会失败的原因