Join 使用sas识别数据集中的相关值对
我有一个包含单词同义词信息的数据集(多行) 数据集的简要示例如下所示。 给出了每个词的同义词信息Join 使用sas识别数据集中的相关值对,join,sas,dataset,Join,Sas,Dataset,我有一个包含单词同义词信息的数据集(多行) 数据集的简要示例如下所示。 给出了每个词的同义词信息 Word Synonym C01 C02 C01 C05 C02 C02 C02 C05 C03 C04 C05 C06 C11 C12 .. .. 从上述数据集中,单词同义词关系可以确定如下 C01-C02-C05-C06 C03-C04 C11-C12 在执行sas代码之后,我想要一个如下所示形式的数据集 Word Synonym1 Synonym2 Synonym3 C0
Word Synonym
C01 C02
C01 C05
C02 C02
C02 C05
C03 C04
C05 C06
C11 C12
.. ..
从上述数据集中,单词同义词关系可以确定如下
C01-C02-C05-C06C03-C04
C11-C12 在执行sas代码之后,我想要一个如下所示形式的数据集
Word Synonym1 Synonym2 Synonym3
C01 C02 C05 C06
C03 C04
C11 C12
我尝试了冗余的内部连接步骤,但似乎有很多不必要的过程。我在SAS中几乎找不到好的解决方案(在其他语言中,这更容易解决)。下面的方法不好,因为它试图将所有组写入一个变量,如果您有大量记录,该变量将很快用完。另外,它依赖于“#”作为分隔符。如果你的话可以有这个字符,你可能会想把它改成不同的东西
data groups;
set testData nObs=numObs;
array groups [*] $32767 group1-group100;
retain groupN 0 group1-group100;
categorized = 0;
* Search for the word or synonym in the existing groups;
if (groupN >= 1) then do;
do currentGroup = 1 to groupN;
if (index(groups[currentGroup], "#"||strip(word)||"#") and index(groups[currentGroup], "#"||strip(synonym)||"#") = 0) then do;
groups[currentGroup] = strip(groups[currentGroup])||strip(synonym)||"#";
categorized = 1;
end;
if (index(groups[currentGroup], "#"||strip(word)||"#") = 0 and index(groups[currentGroup], "#"||strip(synonym)||"#")) then do;
groups[currentGroup] = strip(groups[currentGroup])||strip(word)||"#";
categorized = 1;
end;
if (index(groups[currentGroup], "#"||strip(word)||"#") and index(groups[currentGroup], "#"||strip(synonym)||"#")) then do;
categorized = 1;
end;
end;
end;
* If the word and synonym were not found in the existing groups, create a new one;
if (categorized = 0) then do;
groups[groupN + 1] = "#"||strip(word)||"#"||strip(synonym)||"#";
groupN = groupN + 1;
end;
* Split the groups into unique key/value pairs;
if (_n_ = numObs) then do;
length key value $200;
keep key value;
do currentGroup = 1 to groupN;
if (not missing(groups[currentGroup])) then do;
key = scan(groups[currentGroup], 1, '#');
do j = 2 to countC(groups[currentGroup],'#');
value = scan(groups[currentGroup], j, '#');
if (not missing(value)) then do;
output;
end;
end;
end;
end;
end;
run;
proc sort data = groups;
by key;
run;
proc transpose data = groups out=result(drop = _:) prefix=synonym;
by key;
var value;
run;
您是否拥有SAS/或许可证?它有许多从您的数据类型中查找连接子图的过程