使用sas对字符串进行语法搜索_Sas

使用sas对字符串进行语法搜索

sas

使用sas对字符串进行语法搜索,sas,Sas,得到下面的例子我试图知道表tata中列nomvar中的字符串的任何部分是否存在于表toto的col1中，如果存在，请使用col2给出定义对于I2010，RT，IS-IPI，F_CC11_X_CCXBA，我会在intitule栏中说“是的，托托，塔塔，嗯” 我曾考虑过使用带有insert和select的procsql，但我有两个表，需要进行连接同时，我想把所有的东西都放在一张桌子上，但我不确定这是否是个好主意欢迎提出任何建议，因为我被深深地困住了。SAS数据步骤哈希对象是一种很好的方法。

得到下面的例子

我试图知道表tata中列

nomvar

中的字符串的任何部分是否存在于表toto的col1中，如果存在，请使用col2给出定义

对于I2010，RT，IS-IPI，F_CC11_X_CCXBA，我会在

intitule

栏中说“是的，托托，塔塔，嗯”

我曾考虑过使用带有insert和select的

procsql

，但我有两个表，需要进行连接

同时，我想把所有的东西都放在一张桌子上，但我不确定这是否是个好主意

欢迎提出任何建议，因为我被深深地困住了。

SAS数据步骤哈希对象是一种很好的方法。它允许您将Toto表读入内存，并成为一个查找表。然后使用scan函数、tokenize从Tata表中遍历字符串，并查找col2值。这是代码

顺便说一下，将Tata表转换成Toto这样的结构并执行join也是一种完全合理的方法

/*Create sample data*/
data toto;
   length col1 col2 $ 100;
   col1='I2010';
   col2='yes';
   output;

   col1='RT';
   col2='toto';
   output;

   col1='IS-IPI';
   col2='tata';
   output;

   col1='F_CC11_X_CCXBA';
   col2='well';
   output;
run;

data tata;
   length nomvar intitule $ 100;
   nomvar='I2010,RT,IS-IPI,F_CC11_X_CCXBA';
run;

/*Now for the solution*/
/*You can do this lookup easily with a data step hash object*/
data tata;
  set tata;
  length col1 col2 token $ 100;
  drop col1 col2 token i sepchar rc;

  /*slurp the data in from the Toto data set into the hash*/
  if (_n_ = 1) then do;
     declare hash toto_hash(dataset: 'work.toto');
     rc = toto_hash.definekey('col1');
     rc = toto_hash.definedata('col2');
     toto_hash.definedone();
  end;

  /*now walk the tokens in data set tata and perform the lookup to get each value*/
  i = 1;
  sepchar = ''; /*this will be a comma after the first iteration of the loop*/
  intitule = '';
  do until (token = '');

     /*grab nth item in the comma-separated list*/
     token = scan(nomvar, i, ',');
     /*lookup the col2 value from the toto data set*/
     rc = toto_hash.find(key:token);
     if (rc = 0) then do;
        /*lookup successful so tack the value on*/
        intitule = strip(intitule) || sepchar || col2; 
        sepchar = ',';
     end;

     i = i + 1;
  end;
run;

SAS数据步骤散列对象是一种很好的方法。它允许您将Toto表读入内存，并成为一个查找表。然后使用scan函数、tokenize从Tata表中遍历字符串，并查找col2值。这是代码

顺便说一下，将Tata表转换成Toto这样的结构并执行join也是一种完全合理的方法

/*Create sample data*/
data toto;
   length col1 col2 $ 100;
   col1='I2010';
   col2='yes';
   output;

   col1='RT';
   col2='toto';
   output;

   col1='IS-IPI';
   col2='tata';
   output;

   col1='F_CC11_X_CCXBA';
   col2='well';
   output;
run;

data tata;
   length nomvar intitule $ 100;
   nomvar='I2010,RT,IS-IPI,F_CC11_X_CCXBA';
run;

/*Now for the solution*/
/*You can do this lookup easily with a data step hash object*/
data tata;
  set tata;
  length col1 col2 token $ 100;
  drop col1 col2 token i sepchar rc;

  /*slurp the data in from the Toto data set into the hash*/
  if (_n_ = 1) then do;
     declare hash toto_hash(dataset: 'work.toto');
     rc = toto_hash.definekey('col1');
     rc = toto_hash.definedata('col2');
     toto_hash.definedone();
  end;

  /*now walk the tokens in data set tata and perform the lookup to get each value*/
  i = 1;
  sepchar = ''; /*this will be a comma after the first iteration of the loop*/
  intitule = '';
  do until (token = '');

     /*grab nth item in the comma-separated list*/
     token = scan(nomvar, i, ',');
     /*lookup the col2 value from the toto data set*/
     rc = toto_hash.find(key:token);
     if (rc = 0) then do;
        /*lookup successful so tack the value on*/
        intitule = strip(intitule) || sepchar || col2; 
        sepchar = ',';
     end;

     i = i + 1;
  end;
run;

假设您的数据都是这样构造的（您看到的是

字符之间的不同字符串），我认为最简单的方法是规范化

TATA

（按

拆分），然后进行直接连接，然后（如果需要）调回。（让它保持垂直可能更好——很可能您会发现这种结构更适合分析。）

现在，您可以在

nomvar_out

上进行连接，然后（如果需要）重新组合。

假设您的数据都是这样的结构（您正在查看

字符之间的不同字符串），我认为最简单的方法是规范化

TATA

（按

拆分）然后进行直接连接，然后（如果需要的话）调回原处（最好保持垂直，很可能你会发现这种结构对分析更有用）

现在您可以在

nomvar\u out

上加入，然后（如果需要）重新组合。

嗨，乔，当你指的是用

拆分时，你是指用

更改

，

吗？谢谢。这张图片很难读，但看起来像是用“.”分隔的。我的最终意思是，将每个值用任何东西分隔，然后将这些值写出来，放在不同的行中，所以

abc.def、 ghi

变成了

abc

，新行

def

，新行

ghi

。嗨，乔，它是用逗号分隔的。我要把它改成一个逗号，然后做一个测试。嗨，乔，当你指的是按

拆分时，你是指按

更改

，

吗？谢谢。图片很难读，但看起来像我t以“.”分隔。我的最终意思是，取每个以任何形式分隔的值，并将这些值写入各行，因此

abc.def.ghi

变成

abc

，新行

def

，新行

ghi

。嗨，乔，它是用逗号分隔的。我要将其改为逗号并进行测试。如果nomvar contaiN一个col1中不存在的值？你想把它从intitule中去掉吗？intitule的顺序重要吗？你好，洛朗，toto确实是一个字典，因此tata表中存在的所有内容都应该在col1中匹配。但是让我们假设col1中缺少一些内容，那么intitule应该只有一个空space在两个逗号或一个逗号之间。是的，顺序很重要。如果nomvar包含一个col1中不存在的值怎么办？你想不想将它从intitule中删除？intitule的顺序重要吗？早上好，toto确实起到字典的作用，因此tata表中存在的所有内容都应该在col1中匹配。但让我们假设如果col1中缺少一些内容，那么intitule应该在两个逗号或一个逗号之间只有一个空格。是的，顺序确实很重要。