awk:超出程序限制:最大字段数大小=32767

awk:超出程序限制:最大字段数大小=32767,awk,sh,Awk,Sh,当我在Ubuntu14.04中运行shell脚本时,我遇到了如下错误 awk: program limit exceeded: maximum number of fields size=32767 FILENAME="ensemble_features/Training_BOOKS_POS_Bigram_with_stemming_BOOLEAN_FVT.csv" FNR=1 NR=1 cut: invalid byte, character or field list Try 'cu

当我在Ubuntu14.04中运行shell脚本时,我遇到了如下错误

awk: program limit exceeded: maximum number of fields size=32767
    FILENAME="ensemble_features/Training_BOOKS_POS_Bigram_with_stemming_BOOLEAN_FVT.csv" FNR=1 NR=1
cut: invalid byte, character or field list
Try 'cut --help' for more information.
-1
cut: invalid byte, character or field list
Try 'cut --help' for more information.
6656
user@user-Lenovo-IdeaPad-Z410:~/Thesis/BOOKS$ bash Training_POS_Uni_Bi.sh
awk: program limit exceeded: maximum number of fields size=32767
    FILENAME="ensemble_features/Training_BOOKS_POS_Bigram_with_stemming_BOOLEAN_FVT.csv" FNR=1 NR=1
cut: invalid byte, character or field list
Try 'cut --help' for more information.
-1
cut: invalid byte, character or field list
Try 'cut --help' for more information.
6656
我在下面添加我的脚本

cd /home/user/Thesis/BOOKS/Features/Training/POSITIVE/
fname="ensemble_features"
mkdir $fname

cp /home/user/Thesis/BOOKS/Features/Training/POSITIVE/Training_BOOKS_POS_unigram_FVT_with_stemming_BOOLEAN.csv ensemble_features/
cp /home/user/Thesis/BOOKS/Features/Training/POSITIVE/Training_BOOKS_POS_Bigram_with_stemming_BOOLEAN_FVT.csv ensemble_features/


mkdir "proces"
cnt=0
for file in $fname/*
do
    #Number of columns
    num=`awk 'BEGIN {FS=",";c=0};{if (c==0 ){print NF; c=1}}END{}' $file`
    if [[ cnt -eq 0 ]];then
        cut -d, -f $num $file >class.csv
        cnt=1;
    fi
    num=$((num-1))
    echo $num
    nfname=`basename $file`

    #Cut the columns
    cut -d',' -f1-$num $file > proces/cutlast$nfname
done
#Paste multiple csv
paste -d',' proces/* > comb.csv
paste -d, comb.csv class.csv > Training_BOOKS_Unigram_Bigram_POS_Ensemble_Features_BOOLEAN.csv
rm comb.csv
rm class.csv
rm -r proces
rm -r ensemble_features

我的输入文件分别包含38453列和6656列。任何人都可以帮我纠正这个错误吗?

ubuntu上的awk是指向awk的某个变体的软链接,现在默认为mawk。尝试安装gawk。gawk对记录中的字段数量没有限制


顺便说一句,如果你有时间学习,python可能是一个更好的长期解决方案。

不要使用
awk
?例如Python(
import csv
)可能会让你更幸运。我如何编辑我的pgm?+1还建议使用gawk,它没有字段限制。正确,
gawk
没有最大列数。除此之外,Python的运行速度比awk慢,这就是为什么不能替代较大文件的原因。