AWK:如何合并CSV文件并删除包含特定值的行?

AWK:如何合并CSV文件并删除包含特定值的行?,csv,awk,Csv,Awk,我有数百个CSV文件。每个CSV文件与此类似: | KEYWORD | NUMBER OF COMPS | AVGE M E (K) | GS/M | EST. A SE/M | C CORE | |---------|-----------------|--------------|------|-------------|--------| | Apples | 311 | 12 | N/A | <100 | 10

我有数百个CSV文件。每个CSV文件与此类似:

| KEYWORD | NUMBER OF COMPS | AVGE M E (K) | GS/M | EST. A SE/M | C CORE |
|---------|-----------------|--------------|------|-------------|--------|
| Apples  | 311             | 12           | N/A  | <100        | 10     |
| Bananas | >1,200          | 737          | N/A  | 490         | 88     |
| Oranges | 48              | 184          | N/A  | N/A         | 1      |
| Fruits  | 161             | 94           | N/A  | -           | 6      |
这很好用

但是,我不确定合并后如何删除不需要的行。 到目前为止,我有:

awk '$5 !~ /(<100|N\/A|-)/' ^0-output.csv > ^0-output.csv
Sample4.csv

KEYWORD,NUMBER OF COMPS,AVGE M E (K),GS/M,EST. A SE/M,C CORE
Apples,311,12,N/A,<100,10
Bananas,">1,200",737,N/A,490,88
Oranges,48,184,N/A,N/A,1
Fruits,161,94,N/A,-,63
KEYWORD,NUMBER OF COMPS,AVGE M E (K),GS/M,EST. A SE/M,C CORE
Dino,588,67,N/A,888,234
Thunder,">1,200",211,N/A,<100,77
Ninja,95,37,N/A,-,878
KEYWORD,NUMBER OF COMPS,AVGE M E (K),GS/M,EST. A SE/M,C CORE
Blur,84,2454,N/A,-,234
"KEYWORD","NUMBER OF COMPS","AVGE M E (K)","GS/M","EST. A SE/M","C CORE"
"hedgehog rolls ròund",32,481,N/A,"878",13
"Clever Fox jumps Hîgh",233,83,N/A,"<100",12
"Bear à lot",122,35,N/A,"-",11
"kitten hîgh life","121","673","32","N/A","15"
(注意:当最终的CSV文件以Apple数字打开时,预期输出是否保留包装引号并不重要)

预期输出:(可读格式)

环境: 我正在使用Mac OS X 10.14.6。我无法安装awk的其他版本。

编辑:根据OP的评论,
之间可能也有一个逗号,因此最好使用GNU
awk
编写和测试的
FPAT

awk -v FPAT='[^,]*|"[^"]+"'  '
{ sub(/\r$/,"") }
FNR==1{
  if(NR==1){ print }
  next
}
$5=="<100"||$5=="N/A"||$5=="-"{
  next
}
1
' *.csv
或者,如果您的值也可能包含其他内容,并且您希望使用正则表达式来匹配您希望忽略的值,那么请尝试以下操作

awk '
BEGIN{
  FS=OFS=","
}
FNR==1{
  if(NR==1){ print }
  next
}
$5~/<100/ || $5~/N\/A/ || $5~/-/{ next }
1
'  *.csv
awk'
开始{
FS=OFS=“,”
}
FNR==1{
如果(NR==1){print}
下一个
}

$5~/您可以使用
&&
将两个条件合并为一个:

awk -F, 'NR==1 || (FNR>1 && $5 !~ /^(<100|N\/A|-)$/)' *.csv > output.csv

在我看来,您只对测试倒数第二个字段感兴趣,而且倒数第二个字段和倒数第二个字段都不能包含逗号,因此只需从每行的末尾而不是开始计算字段编号,然后您就不必关心前面的字段是否包含逗号。鉴于此,这将使用任何awk:

$ awk -F',' '(NR==1) || (FNR>1 && $(NF-1)!~/^"?(<100|N\/A|-)"?$/)' *.csv
KEYWORD,NUMBER OF COMPS,AVGE M E (K),GS/M,EST. A SE/M,C CORE
Bananas,">1,200",737,N/A,490,88
Dino,588,67,N/A,888,234
"hedgehog rolls ròund",32,481,N/A,"878",13
$awk-F',''(NR==1)| |(FNR>1和&$(NF-1)!~/^?'(1200),737,不适用,490,88
迪诺,588,67,不适用,888234
“刺猬卷”,32481,不适用,“878”,13
让我们,让我们。
awk '
BEGIN{
  FS=OFS=","
}
FNR==1{
  if(NR==1){ print }
  next
}
$5=="<100"||$5=="N/A"||$5=="-"{ next }
1
'  *.csv
awk '
BEGIN{
  FS=OFS=","
}
FNR==1{
  if(NR==1){ print }
  next
}
$5~/<100/ || $5~/N\/A/ || $5~/-/{ next }
1
'  *.csv
awk '                                        ##Starting awk program from here.
BEGIN{                                       ##Starting BEGIN section of this program from here.
  FS=OFS=","                                 ##Setting field separator as comma here.
}
FNR==1{                                      ##Checking condition if its firt line of current Input_file then do following.
  if(NR==1){ print }                         ##If its very first line of very first Input_file then print that line.
  next                                       ##next will skip all further statements from here.
}
$5=="<100"||$5=="N/A"||$5=="-"{ next }       ##Checking condition if 5th field contains either <100 OR N/A OR - then skip all further statements.
1                                            ##awk'sh way to print the current line.
'  *.csv                                     ##Passing all .csv files to awk program from here.
awk -F, 'NR==1 || (FNR>1 && $5 !~ /^(<100|N\/A|-)$/)' *.csv > output.csv
awk -v FPAT='"[^"]*"|[^,]*' '
NR == 1 || (FNR > 1 && $5 !~ /^(<100|N\/A|-)*$/)' *.csv > output.csv
$ awk -F',' '(NR==1) || (FNR>1 && $(NF-1)!~/^"?(<100|N\/A|-)"?$/)' *.csv
KEYWORD,NUMBER OF COMPS,AVGE M E (K),GS/M,EST. A SE/M,C CORE
Bananas,">1,200",737,N/A,490,88
Dino,588,67,N/A,888,234
"hedgehog rolls ròund",32,481,N/A,"878",13