AWK:如何合并CSV文件并删除包含特定值的行?
我有数百个CSV文件。每个CSV文件与此类似:AWK:如何合并CSV文件并删除包含特定值的行?,csv,awk,Csv,Awk,我有数百个CSV文件。每个CSV文件与此类似: | KEYWORD | NUMBER OF COMPS | AVGE M E (K) | GS/M | EST. A SE/M | C CORE | |---------|-----------------|--------------|------|-------------|--------| | Apples | 311 | 12 | N/A | <100 | 10
| KEYWORD | NUMBER OF COMPS | AVGE M E (K) | GS/M | EST. A SE/M | C CORE |
|---------|-----------------|--------------|------|-------------|--------|
| Apples | 311 | 12 | N/A | <100 | 10 |
| Bananas | >1,200 | 737 | N/A | 490 | 88 |
| Oranges | 48 | 184 | N/A | N/A | 1 |
| Fruits | 161 | 94 | N/A | - | 6 |
这很好用
但是,我不确定合并后如何删除不需要的行。
到目前为止,我有:
awk '$5 !~ /(<100|N\/A|-)/' ^0-output.csv > ^0-output.csv
Sample4.csv
KEYWORD,NUMBER OF COMPS,AVGE M E (K),GS/M,EST. A SE/M,C CORE
Apples,311,12,N/A,<100,10
Bananas,">1,200",737,N/A,490,88
Oranges,48,184,N/A,N/A,1
Fruits,161,94,N/A,-,63
KEYWORD,NUMBER OF COMPS,AVGE M E (K),GS/M,EST. A SE/M,C CORE
Dino,588,67,N/A,888,234
Thunder,">1,200",211,N/A,<100,77
Ninja,95,37,N/A,-,878
KEYWORD,NUMBER OF COMPS,AVGE M E (K),GS/M,EST. A SE/M,C CORE
Blur,84,2454,N/A,-,234
"KEYWORD","NUMBER OF COMPS","AVGE M E (K)","GS/M","EST. A SE/M","C CORE"
"hedgehog rolls ròund",32,481,N/A,"878",13
"Clever Fox jumps Hîgh",233,83,N/A,"<100",12
"Bear à lot",122,35,N/A,"-",11
"kitten hîgh life","121","673","32","N/A","15"
(注意:当最终的CSV文件以Apple数字打开时,预期输出是否保留包装引号并不重要)
预期输出:(可读格式)
环境:
我正在使用Mac OS X 10.14.6。我无法安装awk的其他版本。编辑:根据OP的评论,“
之间可能也有一个逗号,因此最好使用GNUawk
编写和测试的FPAT
awk -v FPAT='[^,]*|"[^"]+"' '
{ sub(/\r$/,"") }
FNR==1{
if(NR==1){ print }
next
}
$5=="<100"||$5=="N/A"||$5=="-"{
next
}
1
' *.csv
或者,如果您的值也可能包含其他内容,并且您希望使用正则表达式来匹配您希望忽略的值,那么请尝试以下操作
awk '
BEGIN{
FS=OFS=","
}
FNR==1{
if(NR==1){ print }
next
}
$5~/<100/ || $5~/N\/A/ || $5~/-/{ next }
1
' *.csv
awk'
开始{
FS=OFS=“,”
}
FNR==1{
如果(NR==1){print}
下一个
}
$5~/您可以使用&&
将两个条件合并为一个:
awk -F, 'NR==1 || (FNR>1 && $5 !~ /^(<100|N\/A|-)$/)' *.csv > output.csv
在我看来,您只对测试倒数第二个字段感兴趣,而且倒数第二个字段和倒数第二个字段都不能包含逗号,因此只需从每行的末尾而不是开始计算字段编号,然后您就不必关心前面的字段是否包含逗号。鉴于此,这将使用任何awk:
$ awk -F',' '(NR==1) || (FNR>1 && $(NF-1)!~/^"?(<100|N\/A|-)"?$/)' *.csv
KEYWORD,NUMBER OF COMPS,AVGE M E (K),GS/M,EST. A SE/M,C CORE
Bananas,">1,200",737,N/A,490,88
Dino,588,67,N/A,888,234
"hedgehog rolls ròund",32,481,N/A,"878",13
$awk-F',''(NR==1)| |(FNR>1和&$(NF-1)!~/^?'(1200),737,不适用,490,88
迪诺,588,67,不适用,888234
“刺猬卷”,32481,不适用,“878”,13
让我们,让我们。
awk '
BEGIN{
FS=OFS=","
}
FNR==1{
if(NR==1){ print }
next
}
$5=="<100"||$5=="N/A"||$5=="-"{ next }
1
' *.csv
awk '
BEGIN{
FS=OFS=","
}
FNR==1{
if(NR==1){ print }
next
}
$5~/<100/ || $5~/N\/A/ || $5~/-/{ next }
1
' *.csv
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS=OFS="," ##Setting field separator as comma here.
}
FNR==1{ ##Checking condition if its firt line of current Input_file then do following.
if(NR==1){ print } ##If its very first line of very first Input_file then print that line.
next ##next will skip all further statements from here.
}
$5=="<100"||$5=="N/A"||$5=="-"{ next } ##Checking condition if 5th field contains either <100 OR N/A OR - then skip all further statements.
1 ##awk'sh way to print the current line.
' *.csv ##Passing all .csv files to awk program from here.
awk -F, 'NR==1 || (FNR>1 && $5 !~ /^(<100|N\/A|-)$/)' *.csv > output.csv
awk -v FPAT='"[^"]*"|[^,]*' '
NR == 1 || (FNR > 1 && $5 !~ /^(<100|N\/A|-)*$/)' *.csv > output.csv
$ awk -F',' '(NR==1) || (FNR>1 && $(NF-1)!~/^"?(<100|N\/A|-)"?$/)' *.csv
KEYWORD,NUMBER OF COMPS,AVGE M E (K),GS/M,EST. A SE/M,C CORE
Bananas,">1,200",737,N/A,490,88
Dino,588,67,N/A,888,234
"hedgehog rolls ròund",32,481,N/A,"878",13