Awk 在两个大文件中查找匹配的ID_Awk_Sed

Awk 在两个大文件中查找匹配的ID

awk sed

Awk 在两个大文件中查找匹配的ID,awk,sed,Awk,Sed,我有两个大文件 file1有1.6亿行，格式如下：id:email file2有4500万行，格式如下：id:hash 问题是找到所有相同的ID并将其保存到第三个文件中，格式为：email:hash 尝试了类似于： awk -F':' 'NR==FNR{a[$1]=$2;next} {print a[$1]":"$2}' test1.in test2.in > res.in 但它不起作用：( 示例文件1：文件2：预期结果： test00@yahoo.com:d63fff1d21e1a

我有两个大文件

file1有1.6亿行，格式如下：

id:email

file2有4500万行，格式如下：

id:hash

问题是找到所有相同的ID并将其保存到第三个文件中，格式为：

email:hash

尝试了类似于：

awk -F':' 'NR==FNR{a[$1]=$2;next} {print a[$1]":"$2}' test1.in test2.in > res.in

但它不起作用：(

示例文件1：

文件2：

预期结果：

test00@yahoo.com:d63fff1d21e1a04c066824dd2f83f3aeaa0edf6e

使用GNU join和GNU bash：

join -t : -j 1 <(sort -t : -k1,1 file1) <(sort -t : -k1,1 file2) -o 1.2,2.2

join-t:-j1在AWK中（不考虑可用资源的数量）：
160m条记录可能无法放入内存中。这些文件是否按id排序？如果是这样，join
是执行此任务的更好工具。是的，已排序。但并非所有id都在第二个文件中，这不是一个问题吗？示例文件2数据未排序。是否应为？nji$join-t：尝试：join-t:-o 1.2,2如果两个输入都已排序，（正如OP所说的那样），这应该单独起作用：join-t:file1file2-o1.2,2.2>file3
test00@yahoo.com:d63fff1d21e1a04c066824dd2f83f3aeaa0edf6e

join -t : -j 1 <(sort -t : -k1,1 file1) <(sort -t : -k1,1 file2) -o 1.2,2.2

join -t: <(sort file1) <(sort file2) -o 1.2,2.2

$ awk -F':' 'NR==FNR{a[$1]=$2;next} a[$1] {print a[$1]":"$2}' test1.in test2.in
test00@yahoo.com :d63fff1d21e1a04c066824dd2f83f3aeaa0edf6e