awk:比较2个文件和2列

awk:比较2个文件和2列,awk,Awk,我必须使用awk比较两个文件。 每个文件的结构都是相同的:路径校验和 File1.txt /content/cr444/commun/ 50d174f143d115b2d12d09c152a2ca59be7fbb91 /content/cr764/commun/ 10d174f14fd115b2d12d09c152a2ca59be7fbb91 /content/cr999/commun/ 10d174f14fd115b2d12d09c152a2ca59be7fbbpp

我必须使用awk比较两个文件。 每个文件的结构都是相同的:路径校验和

File1.txt

/content/cr444/commun/      50d174f143d115b2d12d09c152a2ca59be7fbb91
/content/cr764/commun/     10d174f14fd115b2d12d09c152a2ca59be7fbb91
/content/cr999/commun/     10d174f14fd115b2d12d09c152a2ca59be7fbbpp
File2.txt

/content/cr555/test/        51d174f14f6115b2d12d09c152a2ca59be7fbb91
/content/cr764/commun/     10d174f14fd115b2d12d09c152a2ca59be7fbb78
/content/cr999/commun/     10d174f14fd115b2d12d09c152a2ca59be7fbbpp
预期结果为.csv(带分隔符|):

一种方法是,使用合并两个文件,使用awk比较每行的校验和:

$ join -a1 -a2 -11 -21 -e XXXX -o 0,1.2,2.2 <(sort -k1 file1.txt) <(sort -k1 file2.txt) |
   awk -v OFS='|' '$2 == "XXXX" { print $1, "", $3, "not in file1"; next }
                   $3 == "XXXX" { print $1, $2, "", "not in file2"; next }
                   $2 == $3 { print $1, $2, $3, "same checksum"; next }
                   { print $1, $2, $3, "not same checksum" }'
/content/cr444/commun/|50d174f143d115b2d12d09c152a2ca59be7fbb91||not in file2
/content/cr555/test/||51d174f14f6115b2d12d09c152a2ca59be7fbb91|not in file1
/content/cr764/commun/|10d174f14fd115b2d12d09c152a2ca59be7fbb91|10d174f14fd115b2d12d09c152a2ca59be7fbb78|not same checksum
/content/cr999/commun/|10d174f14fd115b2d12d09c152a2ca59be7fbbpp|10d174f14fd115b2d12d09c152a2ca59be7fbbpp|same checksum

$join-a1-a2-11-21-e XXXX-o 0,1.2,2.2我假设输出行的顺序并不重要。然后你可以:

  • File1.txt
    中的行收集到索引数组中(
    $1->$2
  • 处理
    File2.txt文件中的行:
    
  • 如果(
    $1
    位于(1)的索引数组中,则比较它们的校验和并相应地打印
  • 如果
    $1
    不在(1)的索引数组中,请相应地打印
  • 打印阵列中所有剩余的ITME(1)
  • 代码如下:

    $ awk 'BEGIN{OFS="|"} NR==FNR{f1[$1]=$2; next} {if ($1 in f1) { print $1,f1[$1],$2,($2==f1[$1]?"":"not ")"same checksum"; delete f1[$1]} else print $1,"",$2,"not in file1"} END{for (i in f1) print i,f1[i],"","not in file2"}' File1.txt File2.txt
    
    输出:

    /content/cr555/test/|51d174f14f6115b2d12d09c152a2ca59be7fbb91|not in file1
    /content/cr764/commun/|10d174f14fd115b2d12d09c152a2ca59be7fbb91|10d174f14fd115b2d12d09c152a2ca59be7fbb78|not same checksum
    /content/cr999/commun/|10d174f14fd115b2d12d09c152a2ca59be7fbbpp|10d174f14fd115b2d12d09c152a2ca59be7fbbpp|same checksum
    /content/cr444/commun/|50d174f143d115b2d12d09c152a2ca59be7fbb91||not in file2
    

    到目前为止,您尝试了什么?如果您的分隔符是
    |
    ,它真的是CSV输出吗?@Shawn现在有些人将
    CSV
    中的
    C
    称为
    字符
    ,而不是
    逗号
    。事实上,CSV就是这样定义的,请参见。我仍在努力适应它——只是围绕“什么是CSV”!:-)又多了一层模棱两可@tony最后一行,它应该读
    /|
    而不是
    |
    /content/cr555/test/|51d174f14f6115b2d12d09c152a2ca59be7fbb91|not in file1
    /content/cr764/commun/|10d174f14fd115b2d12d09c152a2ca59be7fbb91|10d174f14fd115b2d12d09c152a2ca59be7fbb78|not same checksum
    /content/cr999/commun/|10d174f14fd115b2d12d09c152a2ca59be7fbbpp|10d174f14fd115b2d12d09c152a2ca59be7fbbpp|same checksum
    /content/cr444/commun/|50d174f143d115b2d12d09c152a2ca59be7fbb91||not in file2