Sas &引用;“按变量未正确排序”;错误,尽管它已排序

Sas &引用;“按变量未正确排序”;错误,尽管它已排序,sas,Sas,我将SAS用于大型数据集(>20gb)。当我运行数据步骤时,我收到“按变量未正确排序……”,尽管我按相同的变量对数据集进行排序。当我再次运行PROC SORT时,SAS甚至说“输入数据集已经排序,没有排序完成” 我的代码是: proc sort data=output.TAQ; by market ric date miliseconds descending type order; run; options nomprint; data markers (keep=market r

我将SAS用于大型数据集(>20gb)。当我运行数据步骤时,我收到“按变量未正确排序……”,尽管我按相同的变量对数据集进行排序。当我再次运行PROC SORT时,SAS甚至说“输入数据集已经排序,没有排序完成” 我的代码是:

proc sort data=output.TAQ;
    by market ric date miliseconds descending type order;
run;

options nomprint;

data markers (keep=market ric date miliseconds type order);
    set output.TAQ;
    by market ric date;

    if first.date;

    * ie do the following once per stock-day;
    * Make 1-second markers;

    /*Type="AMARK"; Order=0; * Set order to zero to ensure that markers get placed before trades and quotes that occur at the same milisecond;
    do i=((9*60*60)+(30*60)) to (16*60*60); miliseconds=i*1000; output; end;*/
run;
错误消息是:

ERROR: BY variables are not properly sorted on data set OUTPUT.TAQ.
RIC=CXR.CCP Date=20160914 Time=13:47:18.125 Type=Quote Price=. Volume=. BidPrice=9.03 BidSize=400
AskPrice=9.04 AskSize=100 Qualifiers=  order=116458952 Miliseconds=49638125 exchange=CCP market=1
FIRST.market=0 LAST.market=0 FIRST.RIC=0 LAST.RIC=0 FIRST.Date=0 LAST.Date=1 i=. _ERROR_=1
_N_=43297873
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 43297874 observations read from the data set OUTPUT.TAQ.
WARNING: The data set WORK.MARKERS may be incomplete.  When this step was stopped there were
         56770826 observations and 6 variables.
WARNING: Data set WORK.MARKERS was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           1:14.21
      cpu time            26.71 seconds

如果源数据集位于数据库中,则可以使用不同的排序规则对其进行排序

排序前请尝试以下操作:

options sortpgm=sas;

错误发生在数据步骤的深处,位于
\u N\u=4397873
。这对我来说意味着
PROC SORT
在一定程度上起作用,但随后失败了。如果不知道您的SAS环境或如何存储
OUTPUT.TAQ
,就很难知道原因是什么

有些人在对大型数据集进行排序时报告了资源问题或文件系统限制

来自(非官方来源):

  • 在工作文件夹中排序时,必须具有等于数据集大小4倍的可用存储空间(如果在Unix下,则为5倍)

  • 您的内存可能用完了

  • 您可以使用选项
    MSGLEVEL=i
    FULLSTIMER
    获得更全面的信息

同时使用
选项sastraceloc=saslog可以生成有用的消息

也许你可以把它分成几个步骤,比如:

/* Get your market ~ ric ~ date pairs */
proc sql;
   create table market_ric_date as
   select distinct market, ric, date
   from output.TAQ
   /* Possibly an order by clause here on market, ric, date */
; quit;

data millisecond_stuff;
  set market_ric_date; 
  *Possibly add type/order in this step as well?;
  do i=((9*60*60)+(30*60)) to (16*60*60); miliseconds=i*1000; output; end;
run;

/* Possibly a third step here to add type / order if you need to get from original data source */

我也有同样的错误,解决方法是在工作目录中复制原始表,进行排序,然后“by”开始工作

在您的情况下,如下所示:

data tmp_TAQ;
    set output.TAQ;
run;

proc sort data=tmp_TAQ;
    by market ric date miliseconds descending type order;
run;

data markers (keep=market ric date miliseconds type order);
    set tmp_TAQ;
    by market ric date;

    if first.date;

    * ie do the following once per stock-day;
    * Make 1-second markers;

    /*Type="AMARK"; Order=0; * Set order to zero to ensure that markers get placed before trades and quotes that occur at the same milisecond;
    do i=((9*60*60)+(30*60)) to (16*60*60); miliseconds=i*1000; output; end;*/
run;

运行proc sort时是否在日志中收到错误消息?当然需要查看更多日志。观察计数非常奇怪-您有“if first.date”,因此标记应该是output.taq的子集,但是在处理停止时,从output.taq读取了约4330万OB,并写入了约5680万OB。标记…@do循环中有一个output语句,所以obs计数可以做各种事情。尝试将强制选项添加到PROC排序中。有可能SAS认为它已经被分类了。显然,数据集的SORTEDBY属性可能存在错误的情况。跳转到observation#43297873,从那里检查+/-5个观察值,了解为什么它会说它没有按正确的顺序排序<代码>数据检查;设置output.taq(firstobs=43297868 obs=43297878);运行