awk:比较2个文件和2列
我必须使用awk比较两个文件。 每个文件的结构都是相同的:路径校验和 File1.txtawk:比较2个文件和2列,awk,Awk,我必须使用awk比较两个文件。 每个文件的结构都是相同的:路径校验和 File1.txt /content/cr444/commun/ 50d174f143d115b2d12d09c152a2ca59be7fbb91 /content/cr764/commun/ 10d174f14fd115b2d12d09c152a2ca59be7fbb91 /content/cr999/commun/ 10d174f14fd115b2d12d09c152a2ca59be7fbbpp
/content/cr444/commun/ 50d174f143d115b2d12d09c152a2ca59be7fbb91
/content/cr764/commun/ 10d174f14fd115b2d12d09c152a2ca59be7fbb91
/content/cr999/commun/ 10d174f14fd115b2d12d09c152a2ca59be7fbbpp
File2.txt
/content/cr555/test/ 51d174f14f6115b2d12d09c152a2ca59be7fbb91
/content/cr764/commun/ 10d174f14fd115b2d12d09c152a2ca59be7fbb78
/content/cr999/commun/ 10d174f14fd115b2d12d09c152a2ca59be7fbbpp
预期结果为.csv(带分隔符|):
一种方法是,使用合并两个文件,使用awk比较每行的校验和:
$ join -a1 -a2 -11 -21 -e XXXX -o 0,1.2,2.2 <(sort -k1 file1.txt) <(sort -k1 file2.txt) |
awk -v OFS='|' '$2 == "XXXX" { print $1, "", $3, "not in file1"; next }
$3 == "XXXX" { print $1, $2, "", "not in file2"; next }
$2 == $3 { print $1, $2, $3, "same checksum"; next }
{ print $1, $2, $3, "not same checksum" }'
/content/cr444/commun/|50d174f143d115b2d12d09c152a2ca59be7fbb91||not in file2
/content/cr555/test/||51d174f14f6115b2d12d09c152a2ca59be7fbb91|not in file1
/content/cr764/commun/|10d174f14fd115b2d12d09c152a2ca59be7fbb91|10d174f14fd115b2d12d09c152a2ca59be7fbb78|not same checksum
/content/cr999/commun/|10d174f14fd115b2d12d09c152a2ca59be7fbbpp|10d174f14fd115b2d12d09c152a2ca59be7fbbpp|same checksum
$join-a1-a2-11-21-e XXXX-o 0,1.2,2.2我假设输出行的顺序并不重要。然后你可以:
将File1.txt
中的行收集到索引数组中($1->$2
)
处理File2.txt文件中的行:
如果($1
位于(1)的索引数组中,则比较它们的校验和并相应地打印
如果$1
不在(1)的索引数组中,请相应地打印
打印阵列中所有剩余的ITME(1)
代码如下:
$ awk 'BEGIN{OFS="|"} NR==FNR{f1[$1]=$2; next} {if ($1 in f1) { print $1,f1[$1],$2,($2==f1[$1]?"":"not ")"same checksum"; delete f1[$1]} else print $1,"",$2,"not in file1"} END{for (i in f1) print i,f1[i],"","not in file2"}' File1.txt File2.txt
输出:
/content/cr555/test/|51d174f14f6115b2d12d09c152a2ca59be7fbb91|not in file1
/content/cr764/commun/|10d174f14fd115b2d12d09c152a2ca59be7fbb91|10d174f14fd115b2d12d09c152a2ca59be7fbb78|not same checksum
/content/cr999/commun/|10d174f14fd115b2d12d09c152a2ca59be7fbbpp|10d174f14fd115b2d12d09c152a2ca59be7fbbpp|same checksum
/content/cr444/commun/|50d174f143d115b2d12d09c152a2ca59be7fbb91||not in file2
到目前为止,您尝试了什么?如果您的分隔符是|
,它真的是CSV输出吗?@Shawn现在有些人将CSV
中的C
称为字符
,而不是逗号
。事实上,CSV就是这样定义的,请参见。我仍在努力适应它——只是围绕“什么是CSV”!:-)又多了一层模棱两可@tony最后一行,它应该读/|
而不是|
?
/content/cr555/test/|51d174f14f6115b2d12d09c152a2ca59be7fbb91|not in file1
/content/cr764/commun/|10d174f14fd115b2d12d09c152a2ca59be7fbb91|10d174f14fd115b2d12d09c152a2ca59be7fbb78|not same checksum
/content/cr999/commun/|10d174f14fd115b2d12d09c152a2ca59be7fbbpp|10d174f14fd115b2d12d09c152a2ca59be7fbbpp|same checksum
/content/cr444/commun/|50d174f143d115b2d12d09c152a2ca59be7fbb91||not in file2