Import 如何在SAS中读取此多格式数据？_Import_Dataset_Sas

Import 如何在SAS中读取此多格式数据？

import sas

Import 如何在SAS中读取此多格式数据？,import,dataset,sas,Import,Dataset,Sas,我有一个奇怪的数据集，需要导入SAS，根据格式将记录拆分为两个表，并删除一些记录。数据结构如下所示： c Comment line 1 c Comment line 2 t lines init a 'mme006' M 8 99 15 '111 ME - RANDOLPH ST' path=no dwt=0.01 42427 ttf=1 us1=3 us2=0 dwt=#0 42350 ttf=1 us1=1.8 us2=0 lay=

我有一个奇怪的数据集，需要导入SAS，根据格式将记录拆分为两个表，并删除一些记录。数据结构如下所示：

c Comment line 1
c Comment line 2
t lines init
a 'mme006'   M   8   99   15   '111 ME - RANDOLPH ST'
  path=no
    dwt=0.01  42427  ttf=1  us1=3  us2=0
    dwt=#0   42350  ttf=1  us1=1.8  us2=0  lay=3
    dwt=>0  42352  ttf=1  us1=0.5  us2=18.13
    42349  lay=3
a 'mme007'   M   8   99   15   '111 ME - RANDOLPH ST'
  path=no
    dwt=+0  42367  ttf=1  us1=0.6  us2=0
    dwt=0.01  42368  ttf=1  us1=0.6  us2=35.63 lay=3
    dwt=#0  42369  ttf=1  us1=0.3  us2=0
    42381  lay=3

只需保留以

、

dwt

或整数开头的行

对于以

开头的行，所需的输出是这样一个表，称为“行”，其中包含行中的前两个非

值：

 name   | type
--------+------
 mme006 | M
 mme007 | M

对于

dwt

/整数行，表“itins”如下所示：

 anode | dwt  | ttf | us1 | us2   | lay
 ------+------+-----+-----+-------+-----
 42427 | 0.01 |   1 | 3.0 |  0.00 |
 42350 | #0   |   1 | 1.8 |  0.00 |   3
 42352 | >0   |   1 | 0.5 | 18.13 | 
 42349 |      |     |     |       |   3       <-- line starting with integer
 42367 | +0   |   1 | 0.6 |  0.00 |
 42368 | 0.01 |   1 | 0.6 | 35.63 |   3
 42369 | #0   |   1 | 0.3 |  0.00 |
 42381 |      |     |     |       |   3       <-- line starting with integer

问题是：

“lines”表是正确的，只是我无法去掉“name”值周围的引号（例如
```
'mme006'
```
）
在“ITIN”表中，正确填充了“ttf”、“us1”和“us2”。但是，“阳极”和“lay”始终为空，“dwt”的值类似于
```
#0 4236
```
和
```
0.01 42
```
，长度始终为8个字符，借用了“阳极”中应包含的部分内容

我做错了什么？

DEQUOTE（）将删除匹配的引号

您使用

dwt

的问题是，您需要告诉它使用什么信息；因此，如果dwt是四长的，

：$4.

，而不仅仅是

然而，阳极是一个问题。我提出的解决方案是：

data lines itins;
  infile in1 missover;
  input @1 first $1. @;
      if first in ('c','t') then delete;
      else if first='a' then do;
        input name $ type $;
        output lines; end;
      else do;
        input @1 path= $ @;
        if path='no' then delete;
        else do;
            if substr(_infile_,5,1)='d' then do;
                input dwt= :$12. ttf= us1= us2= us3= lay=;
                anode=input(scan(dwt,2,' '),best.);
                dwt=scan(dwt,1,' ');
                output itins; 
            end;
            else do;
                input @5 anode 5. lay=;
                output itins;
            end;
        end;
    end;

run;

基本上，首先检查计划；然后，如果不是计划行，请检查dwt中的“d”。如果有，读一行这样的内容，将阳极合并到dwt中，然后将其拆分。如果不存在，只需读入阳极并放置

如果dwt的宽度不是2-4，因此可能需要更短，那么这可能不起作用，您必须明确指出阳极的位置才能正确读入。

我认为您在这里混合输入法时遇到了麻烦。去掉引号很容易（

dequote（）

），但另一部分我不知道你能用这种方法解决，因为

阳极

不能用命名输入读取。感谢

dequote（）

提示！对于允许导入

阳极

而不修改输入数据的替代方法，您有什么建议吗？谢谢，@Joe，这似乎很管用。但是，

阳极

lay=

行出现了一个错误，它（奇怪地）出现在

if substr（_infle，5,1）='d'then do

块中：

注意：没有定义名称，'42349 lay'。

很可能需要处理if条件。这就是

if/else所做的

应该避免：识别只是

阳极+铺层的行，不要尝试在它们上使用其他输入字符串。有可能上一行的lay=
正在下一行搜索lay
，我不完全确定它是如何工作的。这种情况似乎很好，因为阳极+lay=
行随后被else
块正确读取，所以输出表实际上是完整的。我对此感到非常困惑。我认为这是由前一行的lay=
查找下一个要读入的字段（因为它是默认的flowover
）引起的。如果您将其更改为truncover
或misshover
，它可能会起作用（我认为其他任何东西都不应该受到影响，但不确定）。嗯，或者不，它也会发生在misshover中。我不确定SAS为什么要在那里寻找下一行，但我对这种输入方式没有太多经验。
data lines itins;
  infile in1 missover;
  input @1 first $1. @;
      if first in ('c','t') then delete;
      else if first='a' then do;
        input name $ type $;
        output lines; end;
      else do;
        input @1 path= $ @;
        if path='no' then delete;
        else do;
            if substr(_infile_,5,1)='d' then do;
                input dwt= :$12. ttf= us1= us2= us3= lay=;
                anode=input(scan(dwt,2,' '),best.);
                dwt=scan(dwt,1,' ');
                output itins; 
            end;
            else do;
                input @5 anode 5. lay=;
                output itins;
            end;
        end;
    end;

run;