Awk 我想检查第1列中的数字是否等于第2列。第1列应以以下格式开始和结束

Awk 我想检查第1列中的数字是否等于第2列。第1列应以以下格式开始和结束,awk,Awk,我想检查第1列中的数字是否等于第2列,最后第1列应该以“ABC”开头,以“DEF”结尾,但有时它也以“DEFZ”结尾,介于“ABC”和“DEF”之间的数字应该与第2列匹配。有人能帮我吗 我的输入: ABC12345DEF|12345|23132331331| ABC12345DEFZ1|12345|23132331331| ABC12345DEFZ2|12345|23132331331| ABC95678DEF|45678|23132331331| ABC87887DEF|86187|2313

我想检查第1列中的数字是否等于第2列,最后第1列应该以
“ABC”
开头,以
“DEF”
结尾,但有时它也以
“DEFZ”
结尾,介于
“ABC”和
“DEF”
之间的数字应该与第2列匹配。有人能帮我吗

我的输入:

ABC12345DEF|12345|23132331331|
ABC12345DEFZ1|12345|23132331331|
ABC12345DEFZ2|12345|23132331331|
ABC95678DEF|45678|23132331331| 
ABC87887DEF|86187|23132331331|
ABC89043DEF|89043|23132331331|
ABC89043DEFZ1|89043|23132331331|
ABC89043DEFZ2|89043|23132331331|
ABC89043DEFZ3|89043|23132331331|
ABC12345DEF|12345|23132331331|
ABC12345DEFZ1|12345|23132331331|
ABC12345DEFZ2|12345|23132331331|
ABC89043DEFZ1|89043|23132331331|    
ABC89043DEFZ2|89043|23132331331|
ABC89043DEFZ3|89043|23132331331|
输出应为:

ABC12345DEF|12345|23132331331|
ABC12345DEFZ1|12345|23132331331|
ABC12345DEFZ2|12345|23132331331|
ABC95678DEF|45678|23132331331| 
ABC87887DEF|86187|23132331331|
ABC89043DEF|89043|23132331331|
ABC89043DEFZ1|89043|23132331331|
ABC89043DEFZ2|89043|23132331331|
ABC89043DEFZ3|89043|23132331331|
ABC12345DEF|12345|23132331331|
ABC12345DEFZ1|12345|23132331331|
ABC12345DEFZ2|12345|23132331331|
ABC89043DEFZ1|89043|23132331331|    
ABC89043DEFZ2|89043|23132331331|
ABC89043DEFZ3|89043|23132331331|
我正在尝试使用下面的一个,但它不起作用

awk -F '|' '"ABC" $2 "DEF" == $1 && "ABC" $2 "DEFZ"+[0-9] == $1 { print }' WHTFile.txt > QC2Valid.txt**
有人能帮我吗? 提前谢谢

awk -v FS="|" '{tmpvar=$1;gsub(/^ABC|DEF(Z[0-9]+)?$/,"",tmpvar)}tmpvar == $2' infile
输入

akshay@db-3325:/tmp$ cat infile
ABC12345DEF|12345|23132331331|
ABC12345DEFZ1|12345|23132331331|
ABC12345DEFZ2|12345|23132331331|
ABC95678DEF|45678|23132331331|
ABC87887DEF|86187|23132331331|
ABC89043DEF|89043|23132331331|
ABC89043DEFZ1|89043|23132331331|
ABC89043DEFZ2|89043|23132331331|
ABC89043DEFZ3|89043|23132331331|
akshay@db-3325:/tmp$ awk -v FS="|" '{tmpvar = $1; gsub(/^ABC|DEF(Z[0-9]+)?$/,"",tmpvar)} tmpvar == $2' infile
ABC12345DEF|12345|23132331331|
ABC12345DEFZ1|12345|23132331331|
ABC12345DEFZ2|12345|23132331331|
ABC89043DEF|89043|23132331331|
ABC89043DEFZ1|89043|23132331331|
ABC89043DEFZ2|89043|23132331331|
ABC89043DEFZ3|89043|23132331331|
输出

akshay@db-3325:/tmp$ cat infile
ABC12345DEF|12345|23132331331|
ABC12345DEFZ1|12345|23132331331|
ABC12345DEFZ2|12345|23132331331|
ABC95678DEF|45678|23132331331|
ABC87887DEF|86187|23132331331|
ABC89043DEF|89043|23132331331|
ABC89043DEFZ1|89043|23132331331|
ABC89043DEFZ2|89043|23132331331|
ABC89043DEFZ3|89043|23132331331|
akshay@db-3325:/tmp$ awk -v FS="|" '{tmpvar = $1; gsub(/^ABC|DEF(Z[0-9]+)?$/,"",tmpvar)} tmpvar == $2' infile
ABC12345DEF|12345|23132331331|
ABC12345DEFZ1|12345|23132331331|
ABC12345DEFZ2|12345|23132331331|
ABC89043DEF|89043|23132331331|
ABC89043DEFZ1|89043|23132331331|
ABC89043DEFZ2|89043|23132331331|
ABC89043DEFZ3|89043|23132331331|
解释

awk -v FS="|" '{                  # call awk set field separator |
                 tmpvar = $1;     # save first field contents in variable tmpvar

                 # substitute first ABC or DEF 
                 # which can be followed by Z and numbers 
                 # from variable with null globally
                 # so that tmpvar will just have numbers which is between abc and def*
                 gsub(/^ABC|DEF(Z[0-9]+)?$/,"",tmpvar)
               } 
               # if tmpvar is equal to second field then
               # print current record/row/line, thats boolean true, print $0
               tmpvar == $2
              ' infile

  • /^ABC | DEF(Z[0-9]+)?/
    1st Alternative
    ^ABC
    ^
    断言字符串开头的位置
    ABC
    与字符
    ABC
    逐字匹配(区分大小写)

  • 第二个备选方案
    DEF(Z[0-9]+)?
    DEF
    匹配字符
    DEF
    字面上(区分大小写)第一个捕获组
    (Z[0-9]+)?
    量词-匹配0到1次,尽可能多次,根据需要返回(贪婪)
    Z
    匹配字符
    Z
    字面意思(区分大小写)匹配下表中的单个字符
    [0-9]+

  • +
    量词-在一次和无限次之间进行匹配,尽可能多地匹配,根据需要返回(贪婪)


嘿,阿凯,一切正常,非常感谢,非常感谢。当然,谢谢你,阿凯