Awk 比较多列并仅在匹配时替换 我有两个文件(文件1和文件2)

Awk 比较多列并仅在匹配时替换 我有两个文件(文件1和文件2),awk,Awk,我试图将File1的Column1和column2的字符串与File2的Column4和column5进行比较。除此匹配外,File2的第6列还需要匹配某些字符串,如SO或CO(因为FILE1的第3列和第4列分别为SO和CO),然后将File2的第7列替换为FILE1的第3列,否则保持其他字符串不变 我试图修改并使用论坛中针对类似问题提供的解决方案,但没有成功 FILE1 type code SO CO other 7757 1 6941.958 1

我试图将File1的Column1和column2的字符串与File2的Column4和column5进行比较。除此匹配外,File2的第6列还需要匹配某些字符串,如SO或CO(因为FILE1的第3列和第4列分别为SO和CO),然后将File2的第7列替换为FILE1的第3列,否则保持其他字符串不变

  • 我试图修改并使用论坛中针对类似问题提供的解决方案,但没有成功

    FILE1
    type  code     SO  CO other
    
    7757    1       6941.958        138.922 149.17
    7757    2       8666.123        198.908 225.67
    7757    4       2795.885        334.875 378.68
    7759    GT3     222.104    13.5    734.62
    7768    CT2     0       0       0
    7805    6       3796.677        75.175  79.09 
    
    FILE2
    "US","01073",,"7757","1","SO","10","299"
    "US","01073",,"7758","1","SO","10","299"
    "US","01073",,"7757","1","NO","10","299"
    "US","01073",,"7757","1","CO","10","299"
    "US","01073",,"7757","4","MO","10","299"
    "US","01073",,"7757","1","GO","10","299"
    "US","01073",,"7805","6","CO","10","299"
    
    Required output:
    "US","01073",,"7757","1","SO","6941.958","299"
    "US","01073",,"7758","1","SO","10","299"
    "US","01073",,"7757","1","NO","10","299"
    "US","01073",,"7757","1","CO","138.922","299"
    "US","01073",,"7757","4","MO","10","299"
    "US","01073",,"7757","1","GO","10","299"
    "US","01073",,"7805","6","CO","75.175","299"
    
    我尝试的解决方案(仅适用于CO):

    tr-d'temp#删除双引号
    awk'NR==FNR{A[$1,$2]=$3;next}A[$4,$5]&&&$6==“CO”{$7=A[$1,$2];print}'FS=”“OFS=“,”FILE1 temp>out
    

    • 复合awk解决方案:

      awk 'function unquote(f){ 
               return substr(f, 2, length(f)-2) 
           }
           NR==FNR{ 
               if (NR==1){ f3=$3; f4=$4 }
               else if (NF){ a[$1,$2,f3]=$3; a[$1,$2,f4]=$4 }
               next; 
           }
           { k=unquote($4) SUBSEP unquote($5) SUBSEP unquote($6) }
           k in a{ $7=a[k] }1' file1 FS=',' OFS=',' file2
      
      • 函数unquote(f){…}
        -unquotes/在双引号之间提取值(事实上,在字符串的第一个和最后一个字符之间)

      • a[$1,$2,f3]=$3;a[$1,$2,f4]=$4
        -对关键序列进行分组


      输出:

      "US","01073",,"7757","1","SO",6941.958,"299"
      "US","01073",,"7758","1","SO","10","299"
      "US","01073",,"7757","1","NO","10","299"
      "US","01073",,"7757","1","CO",138.922,"299"
      "US","01073",,"7757","4","MO","10","299"
      "US","01073",,"7757","1","GO","10","299"
      "US","01073",,"7805","6","CO",75.175,"299"
      

      非常感谢您帮助编辑我的代码!Randomir。您好RomanPerekhrest,谢谢您的帮助。您的脚本对我来说非常棒。但是我一直得到与“file2”相同的输出,这意味着在输出的第7列中没有任何替换。有什么提示吗?@kelly,提示:确保你已经发布了实际的输入样本,因为它们是复制和测试的。该解决方案对于当前发布的samplesRomanPerekhrest工作良好,这是我的问题,您的代码工作得非常完美。非常感谢您的帮助和时间。@kelly,没问题,@RomanPerekhrest的解决方案与测试数据完美结合。然而,文件2中的实际数据存在问题:第2列类似于“abc,45”或“abc23”,这意味着有些双引号内有逗号,有些则没有。既然我不能用双引号作为这个问题的分隔符,该如何处理呢?谢谢你的帮助。
      "US","01073",,"7757","1","SO",6941.958,"299"
      "US","01073",,"7758","1","SO","10","299"
      "US","01073",,"7757","1","NO","10","299"
      "US","01073",,"7757","1","CO",138.922,"299"
      "US","01073",,"7757","4","MO","10","299"
      "US","01073",,"7757","1","GO","10","299"
      "US","01073",,"7805","6","CO",75.175,"299"