Regex I';我无法通过awk处理数据

Regex I';我无法通过awk处理数据,regex,linux,awk,Regex,Linux,Awk,我正在尝试使用awk处理数据,但无法获得正确的结果。如果在某个地方出错,请告知 数据:-test.txt "A","B","ls",,"This,is,the,test",T, "k",O,"mv",,"This,is,the,2nd test","L", "C",J,"cd",,"This,is,the,3rd test",, awk 'BEGIN { FS=","; OFS="|" } { nf=0; delete f; while ( match($0,/([^,]+)|(\"[^\

我正在尝试使用awk处理数据,但无法获得正确的结果。如果在某个地方出错,请告知 数据:-test.txt

"A","B","ls",,"This,is,the,test",T,
"k",O,"mv",,"This,is,the,2nd test","L",
"C",J,"cd",,"This,is,the,3rd test",,

awk  'BEGIN { FS=","; OFS="|" }  { nf=0; delete f; while ( match($0,/([^,]+)|(\"[^\"]+\")/) ) { f[++nf] = substr($0,RSTART,RLENGTH); $0 = substr($0,RSTART+RLENGTH); };  print f[2],f[3],f[4],f[5] }' test.txt 
输出

但是输出应该是这样的

"B"|"ls"||"This,is,the,test"|T
O|"mv"||"This,is,the,2nd test"|"L"
J|"cd"||"This,is,the,3rd test"|

将pat与新输入和任何awk一起使用:

$ cat tst.awk
BEGIN { FS=","; OFS="|" }
{
    # 1) Replace all FSs inside quotes with the value of RS
    #    since we know that RS cannot be present in any record:
    head = ""
    tail = $0
    while( match(tail,/"[^"]+"/) ) {
        trgt = substr(tail,RSTART,RLENGTH)
        gsub(FS,RS,trgt)
        head = head substr(tail,1,RSTART-1) trgt
        tail = substr(tail,RSTART+RLENGTH)
    }
    $0 = head tail

    # 2) re-compile the record to replace FSs with OFSs:
    $1 = $1

    # 3) restore the RSs within quoted fields to FSs:
    gsub(RS,FS)

    # 4) remove the first and last fields:
    gsub("^[^" OFS "]*[" OFS "]|[" OFS "][^" OFS "]*$","")

    print
}

$ awk -f tst.awk file
"B"|"ls"||"This,is,the,test"|T
O|"mv"||"This,is,the,2nd test"|"L"
J|"cd"||"This,is,the,3rd test"|

如果您从未访问过任何字段,为什么还要费心设置
FS
?为什么不希望输出中出现
“A”
?您的脚本中没有跳过第一个字段的内容。您的问题是为什么不获取空的
|
字段吗?这是因为
[^,]+
要求字段在逗号之间至少有一个字符。我看到您刚刚更新了输入,以包含没有双引号的字段。您是否有任何其他重大惊喜要告诉我们,例如引号字段中的换行符和/或转义引号?或者更简单、更糟糕的是,
awk-vFPAT='“[^”]*”-vOFS='|“{print$2,$3,“,$4}”'file
。请注意,
FPAT
的使用需要GNU awk。感谢您的回复,我在上面更新了hv记录。我已经从第二列中删除了双引号。请让我知道如何处理该文件。此解决方案假设您的所有非空值都是双引号。也许您可以更新您的输入示例以澄清问题要求?我不可能用我拥有的同一版本解决问题。有可能,只是需要更多的工作。顺便说一句,你正在使用一个非常旧的awk,你真的应该升级它,因为你缺少很多非常有用的功能(请参阅)。你能升级你的awk版本吗?
awk -F\" '{q="\""; print q$4q"|"q$6q"||"q$8q}'
awk -vFPAT='"[^"]*"' '{$0=$2"|"$3"||"$4}1' FILE
$ cat tst.awk
BEGIN { FS=","; OFS="|" }
{
    # 1) Replace all FSs inside quotes with the value of RS
    #    since we know that RS cannot be present in any record:
    head = ""
    tail = $0
    while( match(tail,/"[^"]+"/) ) {
        trgt = substr(tail,RSTART,RLENGTH)
        gsub(FS,RS,trgt)
        head = head substr(tail,1,RSTART-1) trgt
        tail = substr(tail,RSTART+RLENGTH)
    }
    $0 = head tail

    # 2) re-compile the record to replace FSs with OFSs:
    $1 = $1

    # 3) restore the RSs within quoted fields to FSs:
    gsub(RS,FS)

    # 4) remove the first and last fields:
    gsub("^[^" OFS "]*[" OFS "]|[" OFS "][^" OFS "]*$","")

    print
}

$ awk -f tst.awk file
"B"|"ls"||"This,is,the,test"|T
O|"mv"||"This,is,the,2nd test"|"L"
J|"cd"||"This,is,the,3rd test"|