将csv导入SAS;警告:记录XXX中已替换了无法转码的字符

将csv导入SAS;警告:记录XXX中已替换了无法转码的字符,csv,sas,Csv,Sas,当我将大型csv导入SAS时,它总是显示“警告:记录XXXXX中已替换了无法转码的字符”。我该怎么办 提前谢谢 1 /********************************************************************** 2 * PRODUCT: SAS 3 * VERSION: 9.4 4 * CREATOR: External File Interface 5 * DATE: 06MAR18 6 * DESC: Generated SAS Dataste

当我将大型csv导入SAS时,它总是显示“警告:记录XXXXX中已替换了无法转码的字符”。我该怎么办

提前谢谢

1 /**********************************************************************
2 * PRODUCT: SAS
3 * VERSION: 9.4
4 * CREATOR: External File Interface
5 * DATE: 06MAR18
6 * DESC: Generated SAS Datastep Code
7 * TEMPLATE SOURCE: (None Specified.)
8 ***********************************************************************/
9 data WORK.Companies ;
10 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
11 infile 'E:\PATSTAT\Companies.csv' delimiter = ',' MISSOVER DSD lrecl=13106 firstobs=2 ;
12 informat person_id best32. ;
13 informat person_name $46. ;
...
36 informat nuts3 $5. ;
37 informat nuts3_name $30. ;
38 format person_id best12. ;
39 format person_name $46. ;
...
62 format nuts3 $5. ;
63 format nuts3_name $30. ;
64 input
...
89 nuts3 $
90 nuts3_name $
91 ;
92 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
93 run;
NOTE: A byte-order mark in the file "E:\PATSTAT\Companies.csv" (for fileref "#LN00025") indicates that the data is encoded in "utf-8". This encoding will be used to process the file.
NOTE: The infile 'E:\PATSTAT\Companies.csv' is: Filename=E:\PATSTAT\Companies.csv, RECFM=V, LRECL=52424, File Size (bytes)=228293377, Last Modified=03 March 2018 19:12:47 o'clock, Create Time=27 November 2017 14:10:57 o'clock
WARNING: A character that could not be transcoded has been replaced in record 775.
WARNING: A character that could not be transcoded has been replaced in record 857.
...
WARNING: A character that could not be transcoded has been replaced in record 10881.
NOTE: Limit set by ERRORS= option reached. Further warnings of this type will not be printed.
NOTE: 1048575 records were read from the infile 'E:\PATSTAT\Companies.csv'.
The minimum record length was 103.
The maximum record length was 680.
NOTE: The data set WORK.COMPANIES has 1048575 observations and 26 variables.
NOTE: DATA statement used (Total process time): real time 7.28 seconds cpu time 3.19 seconds
1048575 rows created in WORK.Companies from E:\PATSTAT\Companies.csv.
NOTE: WORK.COMPANIES data set was successfully created.
NOTE: The data set WORK.COMPANIES has 1048575 observations and 26 variables.

您需要启动支持unicode的SAS来读取UTF-8字符


您可以尝试在当前SAS会话中的
infle
FILENAME
语句中设置
ENCODING=ANY
。编码不应该对数字有影响。但是,如果您确实有无法转换为单字节WLATIN1字符的UTF-8字符,那么您可能无法处理这些字符串

确保您正在使用Unicode支持运行SAS。您对系统选项编码有什么设置?请您更清楚地解释一下好吗?我刚开始使用SAS。提前谢谢。@Tom您如何运行SAS?如果希望能够处理多字节字符,则需要使用正确的设置启动SAS。否则,它将尝试将字符映射到会话正在使用的单字节编码,通常为WLATIN1。您可以使用此程序查看设置<代码>过程选项=编码;运行