Regex I';我无法通过awk处理数据
我正在尝试使用awk处理数据,但无法获得正确的结果。如果在某个地方出错,请告知 数据:-test.txtRegex I';我无法通过awk处理数据,regex,linux,awk,Regex,Linux,Awk,我正在尝试使用awk处理数据,但无法获得正确的结果。如果在某个地方出错,请告知 数据:-test.txt "A","B","ls",,"This,is,the,test",T, "k",O,"mv",,"This,is,the,2nd test","L", "C",J,"cd",,"This,is,the,3rd test",, awk 'BEGIN { FS=","; OFS="|" } { nf=0; delete f; while ( match($0,/([^,]+)|(\"[^\
"A","B","ls",,"This,is,the,test",T,
"k",O,"mv",,"This,is,the,2nd test","L",
"C",J,"cd",,"This,is,the,3rd test",,
awk 'BEGIN { FS=","; OFS="|" } { nf=0; delete f; while ( match($0,/([^,]+)|(\"[^\"]+\")/) ) { f[++nf] = substr($0,RSTART,RLENGTH); $0 = substr($0,RSTART+RLENGTH); }; print f[2],f[3],f[4],f[5] }' test.txt
输出
但是输出应该是这样的
"B"|"ls"||"This,is,the,test"|T
O|"mv"||"This,is,the,2nd test"|"L"
J|"cd"||"This,is,the,3rd test"|
将pat与新输入和任何awk一起使用:
$ cat tst.awk
BEGIN { FS=","; OFS="|" }
{
# 1) Replace all FSs inside quotes with the value of RS
# since we know that RS cannot be present in any record:
head = ""
tail = $0
while( match(tail,/"[^"]+"/) ) {
trgt = substr(tail,RSTART,RLENGTH)
gsub(FS,RS,trgt)
head = head substr(tail,1,RSTART-1) trgt
tail = substr(tail,RSTART+RLENGTH)
}
$0 = head tail
# 2) re-compile the record to replace FSs with OFSs:
$1 = $1
# 3) restore the RSs within quoted fields to FSs:
gsub(RS,FS)
# 4) remove the first and last fields:
gsub("^[^" OFS "]*[" OFS "]|[" OFS "][^" OFS "]*$","")
print
}
$ awk -f tst.awk file
"B"|"ls"||"This,is,the,test"|T
O|"mv"||"This,is,the,2nd test"|"L"
J|"cd"||"This,is,the,3rd test"|
如果您从未访问过任何字段,为什么还要费心设置
FS
?为什么不希望输出中出现“A”
?您的脚本中没有跳过第一个字段的内容。您的问题是为什么不获取空的|
字段吗?这是因为[^,]+
要求字段在逗号之间至少有一个字符。我看到您刚刚更新了输入,以包含没有双引号的字段。您是否有任何其他重大惊喜要告诉我们,例如引号字段中的换行符和/或转义引号?或者更简单、更糟糕的是,awk-vFPAT='“[^”]*”-vOFS='|“{print$2,$3,“,$4}”'file
。请注意,FPAT
的使用需要GNU awk。感谢您的回复,我在上面更新了hv记录。我已经从第二列中删除了双引号。请让我知道如何处理该文件。此解决方案假设您的所有非空值都是双引号。也许您可以更新您的输入示例以澄清问题要求?我不可能用我拥有的同一版本解决问题。有可能,只是需要更多的工作。顺便说一句,你正在使用一个非常旧的awk,你真的应该升级它,因为你缺少很多非常有用的功能(请参阅)。你能升级你的awk版本吗?
awk -F\" '{q="\""; print q$4q"|"q$6q"||"q$8q}'
awk -vFPAT='"[^"]*"' '{$0=$2"|"$3"||"$4}1' FILE
$ cat tst.awk
BEGIN { FS=","; OFS="|" }
{
# 1) Replace all FSs inside quotes with the value of RS
# since we know that RS cannot be present in any record:
head = ""
tail = $0
while( match(tail,/"[^"]+"/) ) {
trgt = substr(tail,RSTART,RLENGTH)
gsub(FS,RS,trgt)
head = head substr(tail,1,RSTART-1) trgt
tail = substr(tail,RSTART+RLENGTH)
}
$0 = head tail
# 2) re-compile the record to replace FSs with OFSs:
$1 = $1
# 3) restore the RSs within quoted fields to FSs:
gsub(RS,FS)
# 4) remove the first and last fields:
gsub("^[^" OFS "]*[" OFS "]|[" OFS "][^" OFS "]*$","")
print
}
$ awk -f tst.awk file
"B"|"ls"||"This,is,the,test"|T
O|"mv"||"This,is,the,2nd test"|"L"
J|"cd"||"This,is,the,3rd test"|