Linux 压缩sas数据集时会增加其大小?

Linux 压缩sas数据集时会增加其大小?,linux,sas,Linux,Sas,我编写了一段代码,用compress=yes选项创建SAS数据集。也就是说,生成的数据集正在以更大的大小进行压缩,如日志中所示 1374 +proc sql; 1375 + create table seg.KRG_EO_PVS_CUST_PROD_&op_cyc. 1376 + ( 1377 + COMPRESS = YES 1378 + ) as 1379 + select ^L32

我编写了一段代码,用compress=yes选项创建SAS数据集。也就是说,生成的数据集正在以更大的大小进行压缩,如日志中所示

1374      +proc sql;
1375      +   create table seg.KRG_EO_PVS_CUST_PROD_&op_cyc.
1376      +   (
1377      +      COMPRESS = YES
1378      +   ) as
1379      +   select
^L32                                                         The SAS System                            02:15 Thursday, August 20, 2015

1380      +      W6DFFTE1.DIB_CUST_ID length = 8
1381      +         format = 15.
1382      +         informat = 15.
1383      +         label = 'The logical customer id',
1384      +      W6DFFTE1.DIB_PROD_ID length = 8
1385      +         format = 15.
1386      +         informat = 15.
1387      +         label = 'The product id',
1388      +      case when W5TM24S0.OFFER_FLAG = "1" then "1"      else "0"   end as OFFER_FLAG length = 1,
1389      +      sum(W6DFFTE1.TOT_QUANTITY ) as TOT_QUANTITY length = 8
1390      +         format = 10.
1391      +         informat = 5.
1392      +         label = 'Number of items purchased'
1393      +   from
1394      +      work.W6DFFTE1 left join
1395      +      work.W5TM24S0
1396      +         on
1397      +         (
1398      +            W5TM24S0.DIB_STORE_ID = W6DFFTE1.DIB_STORE_ID
1399      +            and W5TM24S0.DIB_SCAN_ID = W6DFFTE1.DIB_SCAN_ID
1400      +         )
1401      +   group by
1402      +      W6DFFTE1.DIB_CUST_ID,
1403      +      W6DFFTE1.DIB_PROD_ID,
1404      +      W5TM24S0.OFFER_FLAG
1405      +   ;
NOTE: Compressing data set SEG.KRG_EO_PVS_CUST_PROD_20150701 increased size by 43.27 percent.
      Compressed is 1961732 pages; un-compressed would require 1369265 pages.
NOTE: Table SEG.KRG_EO_PVS_CUST_PROD_20150701 created, with 346423801 rows and 4 columns.

我只想知道发生这种情况的可能原因是什么

SAS压缩非常简单,
compress=yes
只是让SAS通过不在字符变量中写入未使用长度的实际数据字节来节省磁盘空间。看起来您的数据是三个数字变量加上一个单字符长的变量。这不需要太多的工作,而且它必须添加压缩文件所涉及的任何格式开销

如果您需要压缩文件以进行中长期存储,最好使用单独的zip或tar实用程序

编辑:我不是想贬低SAS压缩。我相信设计师们更关心的是保持相对快速的磁盘访问,而不是提供真正的zip压缩