Sas 获取两个特定字符位置之间的字符串_Sas_Scanf_Extract_Substr

Sas 获取两个特定字符位置之间的字符串

sas

Sas 获取两个特定字符位置之间的字符串,sas,scanf,extract,substr,Sas,Scanf,Extract,Substr,SAS中有一个长文本字符串，其中有一个可变长度的值，但总是以“#”开头，然后以“，”结尾有没有一种方法可以将其提取并存储为新变量 e、 g: 字，字，字，#12.34，字，字我想得到12.34分谢谢您可以使用substr和index函数来执行此操作。index函数返回指定字符的第一个位置 data _null_; var1 = 'word word, word, #12.34, word, word'; pos1 = index(var1,'#'); *Get the position

SAS中有一个长文本字符串，其中有一个可变长度的值，但总是以“#”开头，然后以“，”结尾

有没有一种方法可以将其提取并存储为新变量

e、 g: 字，字，字，#12.34，字，字

我想得到12.34分

谢谢

您可以使用

substr

和

index

函数来执行此操作。

index

函数返回指定字符的第一个位置

data _null_;
var1 = 'word word, word, #12.34, word, word';
pos1 = index(var1,'#'); *Get the position of the first # sign;
tmp = substr(var1,pos1+1); *Create a string that returns only characters after the # sign;
put tmp;
pos2 = index(tmp,','); *Get the position of the first "," in the tmp variable;
var2 = substr(tmp,1,pos2-1);
put var2;
run;

请注意，此方法仅在字符串中只有一个“#”时有效。

另一种方法是使用正则表达式，下面给出了代码

data have;
infile datalines truncover ;
input var $200.;
datalines;
word word, word, #12.34, word, word
word1 #12.34, hello hi hello hi
word1 #970000 hello hi hello hi #970022, hi
word1 123,  hello hi hello hi #97.99
#99456, this is cool
 ;

下面是一个关于正则表达式和函数的小说明

（？如果您只有一个

，则双扫描也应起作用：

一种方法是使用

index

定位两个界定值的“哨兵”，并使用

substr

检索内部。如果该值应为数字，则需要额外使用

input

功能

第二种方法是使用正则表达式例程

prxmatch

和

prxposn

来定位和提取嵌入值

data have;
  input; 
  longtext = _infile_;
datalines;
some thing #12.34, wicked
#, oops
#5a64, oops
# oops
oops ,
oops #
ok #1234,
who wants be a #1e6,aire
space #   , the final frontier
double #12, jeopardy #34, alex
run;

data want;
  set have;

  * locate with index;

  _p1 = index(longtext,'#');
  if _p1 then _p2 = index(substr(longtext,_p1),',');
  if _p2 > 2 then num_in_text = input (substr(longtext,_p1+1,_p2-2), ?? best.);

  * locate with regular expression;

  if _n_ = 1 then _rx = prxparse('/#(\d*\.?\d*)?,/'); retain _rx;
  if prxmatch(_rx,longtext) then do;
    call prxposn(_rx,1,_start,_length);
    if _length > 0 then num_in_text_2 = input (substr(longtext,_start, _length), ?? best.);
  end;

  * drop _: ;
run;

正则表达式方法查找######变体，索引方法只查找#…，。然后输入函数将破译正则表达式的科学符号值（示例模式）way将不会“定位”。

输入

函数中的？选项可防止无效参数注意：当所包含的值无法解析为数字时，日志中的s将被视为无效参数。

如果您想变得真正懒惰，可以这样做

want = compress(have,".","kd");

谢谢，这很好，可以单独使用，但是当试图在我的数据步骤中使用set语句时，它会在“#”之后返回所有内容。我需要更改我的数据步骤的语法吗？你能发布数据步骤的外观吗？语法是一样的。很抱歉，它确实有效！再次感谢。只是无法获得正确的i组合ndex、scan或substrings！@J#Lard如果您使用具有start column参数的FIND，您可以修改该参数以搜索目标中的多个匹配项。这将不起作用，因为他正在查找以#开头，以#结尾的数字，

data have;
  input; 
  longtext = _infile_;
datalines;
some thing #12.34, wicked
#, oops
#5a64, oops
# oops
oops ,
oops #
ok #1234,
who wants be a #1e6,aire
space #   , the final frontier
double #12, jeopardy #34, alex
run;

data want;
  set have;

  * locate with index;

  _p1 = index(longtext,'#');
  if _p1 then _p2 = index(substr(longtext,_p1),',');
  if _p2 > 2 then num_in_text = input (substr(longtext,_p1+1,_p2-2), ?? best.);

  * locate with regular expression;

  if _n_ = 1 then _rx = prxparse('/#(\d*\.?\d*)?,/'); retain _rx;
  if prxmatch(_rx,longtext) then do;
    call prxposn(_rx,1,_start,_length);
    if _length > 0 then num_in_text_2 = input (substr(longtext,_start, _length), ?? best.);
  end;

  * drop _: ;
run;

want = compress(have,".","kd");