Sas 正确使用数据线输入
我需要找出正确的输入语句来读取数据线中的数据 尝试的指针和位置值Sas 正确使用数据线输入,sas,Sas,我需要找出正确的输入语句来读取数据线中的数据 尝试的指针和位置值 data oscar; input @1 oscardate $ @9 oscaryear @14 budget dollar11. gross dollar13. +1 title $16. +1 asofdate mmddyy10. +1 rating 3.1; format asofdate mmddyy10. budget dollar12. gross dollar13.; da
data oscar;
input @1 oscardate $ @9 oscaryear @14 budget dollar11. gross dollar13.
+1 title $16. +1 asofdate mmddyy10. +1 rating 3.1;
format asofdate mmddyy10. budget dollar12. gross dollar13.;
datalines;
27Feb11 2011 $15,000,000 $373,700,000 The Kings Speech 02/26/2012 8.2
07Mar10 2010 $11,000,000 $12,647,089 The Hurt Locker 02/26/2012 7.2
22Feb09 2009 $15,000,000 $141,319,195 Slumdog Millionaire 02/26/2012 8.2
24Feb08 2008 $25,000,000 $74,273,505 No Country for Old Men 02/26/2012 8.2
25Feb07 2007 $90,000,000 $289,800,000 The Departed 02/26/2012 8.5
05Mar06 2006 $6,500,000 $98,410,061 Crash 02/26/2012 8.0
;
run;
我希望能够打印出数据行中写入的值使用
attrib
语句声明和指定输入变量的属性。代码的读取(对于人类而言)将比少量其他影响语句的属性以及input
语句中可能隐含的决定更清晰
339 data _null_;
340 set oscar;
341 file log dsd dlm=' ' ;
342 put oscardate -- rating ;
343 run;
2011-02-27 2011 $15,000,000 $373,700,000 "The Kings Speech" 2012-02-26 8.2
2010-03-07 2010 $11,000,000 $12,647,089 "The Hurt Locker" 2012-02-26 7.2
2009-02-22 2009 $15,000,000 $141,319,195 "Slumdog Millionaire" 2012-02-26 8.2
2008-02-24 2008 $25,000,000 $74,273,505 "No Country for Old Men" 2012-02-26 8.2
2007-02-25 2007 $90,000,000 $289,800,000 "The Departed" 2012-02-26 8.5
2006-03-05 2006 $6,500,000 $98,410,061 Crash 2012-02-26 8.0
当数据具有一个包含多个内部单空格的字符值时,您应该使用两个空格(并使用&
输入修饰符)将该值与其他字符值进行偏移,或者双引号引用该数据值(并使用内嵌卡dsd dlm=“”
)
例如,标题由两个空格限定:
data oscar;
attrib
oscardate format=date9. informat=date9.
oscaryear format=4.
budget format=dollar13.0 informat=dollar13.0
gross format=dollar13.0 informat=dollar13.0
title length=$200
asofdate format=mmddyy10. informat=mmddyy10.
rating format=4.1
;
input oscardate oscaryear budget gross & title & asofdate rating;
datalines;
27Feb11 2011 $15,000,000 $373,700,000 The Kings Speech 02/26/2012 8.2
07Mar10 2010 $11,000,000 $12,647,089 The Hurt Locker 02/26/2012 7.2
22Feb09 2009 $15,000,000 $141,319,195 Slumdog Millionaire 02/26/2012 8.2
24Feb08 2008 $25,000,000 $74,273,505 No Country for Old Men 02/26/2012 8.2
25Feb07 2007 $90,000,000 $289,800,000 The Departed 02/26/2012 8.5
05Mar06 2006 $6,500,000 $98,410,061 Crash 02/26/2012 8.0
run;
对于特别有害的数据线,您可能需要执行输入
用整行填充自动\u infle\u
变量,然后使用Perl正则表达式模式匹配来提取所需的部分
其他数据行构造将确保所有数据字段按列对齐,在这种情况下,@
修饰符可用于读取从特定列开始的值
input
有太多的功能,没有最好的方法或一条正确的语句。当前行的格式不足以解析它们。这是因为它们使用空格作为字段之间的分隔符,但标题字段也包含空格。有许多方法可以修复数据线,从而使解析数据线成为可能
339 data _null_;
340 set oscar;
341 file log dsd dlm=' ' ;
342 put oscardate -- rating ;
343 run;
2011-02-27 2011 $15,000,000 $373,700,000 "The Kings Speech" 2012-02-26 8.2
2010-03-07 2010 $11,000,000 $12,647,089 "The Hurt Locker" 2012-02-26 7.2
2009-02-22 2009 $15,000,000 $141,319,195 "Slumdog Millionaire" 2012-02-26 8.2
2008-02-24 2008 $25,000,000 $74,273,505 "No Country for Old Men" 2012-02-26 8.2
2007-02-25 2007 $90,000,000 $289,800,000 "The Departed" 2012-02-26 8.5
2006-03-05 2006 $6,500,000 $98,410,061 Crash 2012-02-26 8.0
您可以将这些行转换为使用不在任何字段中显示的其他分隔符。管道字符很有用。制表符('09'x)也很常见
27Feb11|2011|$15,000,000|$373,700,000|The Kings Speech|02/26/2012|8.2
可以在包含分隔符的任何值周围添加引号
您可以使用固定长度字段
27Feb11 2011 $15,000,000 $373,700,000 The Kings Speech 02/26/2012 8.2
07Mar10 2010 $11,000,000 $12,647,089 The Hurt Locker 02/26/2012 7.2
22Feb09 2009 $15,000,000 $141,319,195 Slumdog Millionaire 02/26/2012 8.2
24Feb08 2008 $25,000,000 $74,273,505 No Country for Old Men 02/26/2012 8.2
如果在本例中类似,则只有一个字段可能包含分隔符,并将该字段放在行尾
27Feb11 2011 $15,000,000 $373,700,000 02/26/2012 8.2 The Kings Speech
对于SAS,您还可以确保标题值不包含相邻的多个空格,然后确保标题后面至少有两个空格。然后,输入
语句上的和
修饰符将告诉SAS在值中允许空格,但在看到两个相邻空格时停止读取该字段
27Feb11 2011 $15,000,000 $373,700,000 The Kings Speech 02/26/2012 8.2
07Mar10 2010 $11,000,000 $12,647,089 The Hurt Locker 02/26/2012 7.2
22Feb09 2009 $15,000,000 $141,319,195 Slumdog Millionaire 02/26/2012 8.2
24Feb08 2008 $25,000,000 $74,273,505 No Country for Old Men 02/26/2012 8.2
如果无法修复输入行,并且只有一个字段具有嵌入空格,则可以在代码中解析该行。因此,对于这一个,我只需将最后三个字段读入标题值,然后取出日期和评级并删除它们
data oscar;
infile datalines truncover;
length oscardate 8 oscaryear 8 budget 8 gross 8 title $50 asofdate 8 rating 8;
informat oscardate date. budget gross dollar. asofdate mmddyy. ;
format oscardate asofdate yymmdd10. budget gross dollar12. rating 4.1 ;
input oscardate -- gross title $50. ;
rating = input(scan(title,-1,' '),32.);
asofdate = input(scan(title,-2,' '),mmddyy10.);
title = substrn(title,1,length(title)-length(scan(title,-1,' '))-length(scan(title,-2,' '))-2);
datalines;
27Feb11 2011 $15,000,000 $373,700,000 The Kings Speech 02/26/2012 8.2
07Mar10 2010 $11,000,000 $12,647,089 The Hurt Locker 02/26/2012 7.2
22Feb09 2009 $15,000,000 $141,319,195 Slumdog Millionaire 02/26/2012 8.2
24Feb08 2008 $25,000,000 $74,273,505 No Country for Old Men 02/26/2012 8.2
25Feb07 2007 $90,000,000 $289,800,000 The Departed 02/26/2012 8.5
05Mar06 2006 $6,500,000 $98,410,061 Crash 02/26/2012 8.0
;
如果要将新版本的数据写入可解析的文本,请在FILE语句和简单put语句上使用DSD选项
339 data _null_;
340 set oscar;
341 file log dsd dlm=' ' ;
342 put oscardate -- rating ;
343 run;
2011-02-27 2011 $15,000,000 $373,700,000 "The Kings Speech" 2012-02-26 8.2
2010-03-07 2010 $11,000,000 $12,647,089 "The Hurt Locker" 2012-02-26 7.2
2009-02-22 2009 $15,000,000 $141,319,195 "Slumdog Millionaire" 2012-02-26 8.2
2008-02-24 2008 $25,000,000 $74,273,505 "No Country for Old Men" 2012-02-26 8.2
2007-02-25 2007 $90,000,000 $289,800,000 "The Departed" 2012-02-26 8.5
2006-03-05 2006 $6,500,000 $98,410,061 Crash 2012-02-26 8.0
您将第一个日期作为字符串而不是日期读取的原因是什么?您有没有办法修复数据行,使其具有正确的分隔符?当前行使用空格作为分隔符,但TITLE的值在某些行上包含空格。如果你能解决这个问题,那就可以读这行了。