Sas 正确使用数据线输入

Sas 正确使用数据线输入,sas,Sas,我需要找出正确的输入语句来读取数据线中的数据 尝试的指针和位置值 data oscar; input @1 oscardate $ @9 oscaryear @14 budget dollar11. gross dollar13. +1 title $16. +1 asofdate mmddyy10. +1 rating 3.1; format asofdate mmddyy10. budget dollar12. gross dollar13.; da

我需要找出正确的输入语句来读取数据线中的数据

尝试的指针和位置值

data oscar;
    input @1 oscardate $ @9 oscaryear @14 budget dollar11. gross dollar13. 
        +1 title $16. +1 asofdate mmddyy10. +1 rating 3.1;
    format asofdate mmddyy10. budget dollar12. gross dollar13.;
    datalines;
27Feb11 2011 $15,000,000 $373,700,000 The Kings Speech 02/26/2012 8.2
07Mar10 2010 $11,000,000 $12,647,089 The Hurt Locker 02/26/2012 7.2
22Feb09 2009 $15,000,000 $141,319,195 Slumdog Millionaire 02/26/2012 8.2
24Feb08 2008 $25,000,000 $74,273,505 No Country for Old Men 02/26/2012 8.2
25Feb07 2007 $90,000,000 $289,800,000 The Departed 02/26/2012 8.5
05Mar06 2006 $6,500,000 $98,410,061 Crash 02/26/2012 8.0
;
run;

我希望能够打印出数据行中写入的值

使用
attrib
语句声明和指定输入变量的属性。代码的读取(对于人类而言)将比少量其他影响语句的属性以及
input
语句中可能隐含的决定更清晰

339  data _null_;
340   set oscar;
341   file log dsd dlm=' ' ;
342   put oscardate -- rating ;
343  run;

2011-02-27 2011 $15,000,000 $373,700,000 "The Kings Speech" 2012-02-26 8.2
2010-03-07 2010 $11,000,000 $12,647,089 "The Hurt Locker" 2012-02-26 7.2
2009-02-22 2009 $15,000,000 $141,319,195 "Slumdog Millionaire" 2012-02-26 8.2
2008-02-24 2008 $25,000,000 $74,273,505 "No Country for Old Men" 2012-02-26 8.2
2007-02-25 2007 $90,000,000 $289,800,000 "The Departed" 2012-02-26 8.5
2006-03-05 2006 $6,500,000 $98,410,061 Crash 2012-02-26 8.0
当数据具有一个包含多个内部单空格的字符值时,您应该使用两个空格(并使用
&
输入修饰符)将该值与其他字符值进行偏移,或者双引号引用该数据值(并使用
内嵌卡dsd dlm=“”

例如,标题由两个空格限定:

data oscar;
  attrib
    oscardate format=date9. informat=date9.
    oscaryear format=4.
    budget    format=dollar13.0 informat=dollar13.0
    gross     format=dollar13.0 informat=dollar13.0
    title     length=$200
    asofdate  format=mmddyy10. informat=mmddyy10.
    rating    format=4.1
  ;

  input oscardate oscaryear budget gross & title & asofdate rating;
datalines;
27Feb11 2011 $15,000,000 $373,700,000  The Kings Speech  02/26/2012 8.2
07Mar10 2010 $11,000,000 $12,647,089  The Hurt Locker  02/26/2012 7.2
22Feb09 2009 $15,000,000 $141,319,195  Slumdog Millionaire  02/26/2012 8.2
24Feb08 2008 $25,000,000 $74,273,505  No Country for Old Men  02/26/2012 8.2
25Feb07 2007 $90,000,000 $289,800,000  The Departed  02/26/2012 8.5
05Mar06 2006 $6,500,000 $98,410,061  Crash  02/26/2012 8.0
run;
对于特别有害的数据线,您可能需要执行
输入
用整行填充自动
\u infle\u
变量,然后使用Perl正则表达式模式匹配来提取所需的部分

其他数据行构造将确保所有数据字段按列对齐,在这种情况下,
@
修饰符可用于读取从特定列开始的值


input
有太多的功能,没有最好的方法或一条正确的语句。

当前行的格式不足以解析它们。这是因为它们使用空格作为字段之间的分隔符,但标题字段也包含空格。有许多方法可以修复数据线,从而使解析数据线成为可能

339  data _null_;
340   set oscar;
341   file log dsd dlm=' ' ;
342   put oscardate -- rating ;
343  run;

2011-02-27 2011 $15,000,000 $373,700,000 "The Kings Speech" 2012-02-26 8.2
2010-03-07 2010 $11,000,000 $12,647,089 "The Hurt Locker" 2012-02-26 7.2
2009-02-22 2009 $15,000,000 $141,319,195 "Slumdog Millionaire" 2012-02-26 8.2
2008-02-24 2008 $25,000,000 $74,273,505 "No Country for Old Men" 2012-02-26 8.2
2007-02-25 2007 $90,000,000 $289,800,000 "The Departed" 2012-02-26 8.5
2006-03-05 2006 $6,500,000 $98,410,061 Crash 2012-02-26 8.0
您可以将这些行转换为使用不在任何字段中显示的其他分隔符。管道字符很有用。制表符('09'x)也很常见

27Feb11|2011|$15,000,000|$373,700,000|The Kings Speech|02/26/2012|8.2
可以在包含分隔符的任何值周围添加引号

您可以使用固定长度字段

27Feb11 2011    $15,000,000 $373,700,000  The Kings Speech        02/26/2012 8.2
07Mar10 2010    $11,000,000  $12,647,089  The Hurt Locker         02/26/2012 7.2
22Feb09 2009    $15,000,000 $141,319,195  Slumdog Millionaire     02/26/2012 8.2
24Feb08 2008    $25,000,000  $74,273,505  No Country for Old Men  02/26/2012 8.2
如果在本例中类似,则只有一个字段可能包含分隔符,并将该字段放在行尾

27Feb11 2011 $15,000,000 $373,700,000 02/26/2012 8.2 The Kings Speech
对于SAS,您还可以确保标题值不包含相邻的多个空格,然后确保标题后面至少有两个空格。然后,
输入
语句上的
修饰符将告诉SAS在值中允许空格,但在看到两个相邻空格时停止读取该字段

27Feb11 2011 $15,000,000 $373,700,000 The Kings Speech  02/26/2012 8.2
07Mar10 2010 $11,000,000 $12,647,089 The Hurt Locker  02/26/2012 7.2
22Feb09 2009 $15,000,000 $141,319,195 Slumdog Millionaire  02/26/2012 8.2
24Feb08 2008 $25,000,000 $74,273,505 No Country for Old Men  02/26/2012 8.2
如果无法修复输入行,并且只有一个字段具有嵌入空格,则可以在代码中解析该行。因此,对于这一个,我只需将最后三个字段读入标题值,然后取出日期和评级并删除它们

data oscar;
  infile datalines truncover;
  length oscardate 8 oscaryear 8 budget 8  gross 8 title $50 asofdate 8 rating 8;
  informat oscardate date. budget gross dollar. asofdate mmddyy. ;
  format oscardate asofdate yymmdd10. budget gross dollar12. rating 4.1 ;
  input oscardate -- gross title $50. ;
  rating = input(scan(title,-1,' '),32.);
  asofdate = input(scan(title,-2,' '),mmddyy10.);
  title = substrn(title,1,length(title)-length(scan(title,-1,' '))-length(scan(title,-2,' '))-2);
datalines;
27Feb11 2011 $15,000,000 $373,700,000 The Kings Speech 02/26/2012 8.2
07Mar10 2010 $11,000,000 $12,647,089 The Hurt Locker 02/26/2012 7.2
22Feb09 2009 $15,000,000 $141,319,195 Slumdog Millionaire 02/26/2012 8.2
24Feb08 2008 $25,000,000 $74,273,505 No Country for Old Men 02/26/2012 8.2
25Feb07 2007 $90,000,000 $289,800,000 The Departed 02/26/2012 8.5
05Mar06 2006 $6,500,000 $98,410,061 Crash 02/26/2012 8.0
;
如果要将新版本的数据写入可解析的文本,请在FILE语句和简单put语句上使用DSD选项

339  data _null_;
340   set oscar;
341   file log dsd dlm=' ' ;
342   put oscardate -- rating ;
343  run;

2011-02-27 2011 $15,000,000 $373,700,000 "The Kings Speech" 2012-02-26 8.2
2010-03-07 2010 $11,000,000 $12,647,089 "The Hurt Locker" 2012-02-26 7.2
2009-02-22 2009 $15,000,000 $141,319,195 "Slumdog Millionaire" 2012-02-26 8.2
2008-02-24 2008 $25,000,000 $74,273,505 "No Country for Old Men" 2012-02-26 8.2
2007-02-25 2007 $90,000,000 $289,800,000 "The Departed" 2012-02-26 8.5
2006-03-05 2006 $6,500,000 $98,410,061 Crash 2012-02-26 8.0

您将第一个日期作为字符串而不是日期读取的原因是什么?您有没有办法修复数据行,使其具有正确的分隔符?当前行使用空格作为分隔符,但TITLE的值在某些行上包含空格。如果你能解决这个问题,那就可以读这行了。