Bash 删除重复的用户名,并合并重复的列
现在我有几个不同的列表,我会尽力解释清楚 列表1如下所示:Bash 删除重复的用户名,并合并重复的列,bash,awk,Bash,Awk,现在我有几个不同的列表,我会尽力解释清楚 列表1如下所示: user1,host1:port1 user2,host2:port2 user1,host3:port3 user1 email1 user2 email2 user1 email1 user1 email1 host1:port1, host3:port3 user2 email2 host2:port2 我在数据库中查找用户名并返回以下内容: user1,host1:port1 user2,h
user1,host1:port1
user2,host2:port2
user1,host3:port3
user1 email1
user2 email2
user1 email1
user1 email1 host1:port1, host3:port3
user2 email2 host2:port2
我在数据库中查找用户名并返回以下内容:
user1,host1:port1
user2,host2:port2
user1,host3:port3
user1 email1
user2 email2
user1 email1
user1 email1 host1:port1, host3:port3
user2 email2 host2:port2
在我的示例中,两个文件都有重复的用户和电子邮件。但是,主机和端口可能都不同。获得如下输出的最有效方法是什么:
user1,host1:port1
user2,host2:port2
user1,host3:port3
user1 email1
user2 email2
user1 email1
user1 email1 host1:port1, host3:port3
user2 email2 host2:port2
我假设awk的高级使用,但坦率地说,像这样的东西我不知道。如有任何正确方向的帮助/指示,将不胜感激
$ cat file1
user1,host1:port1
user2,host2:port2
user1,host3:port3
$ cat file2
user1 email1
user2 email2
user1 email1
$ cat tst.awk
BEGIN{ FS="[[:space:],]+" }
NR==FNR { user2hosts[$1][$2]; next }
{ user2email[$1] = $2 }
END {
for (user in user2email) {
printf "%s\t%s\t", user, user2email[user]
sep = ""
for (host in user2hosts[user]) {
printf "%s%s", sep, host
sep = ", "
}
print ""
}
}
$ gawk -f tst.awk file1 file2
user1 email1 host1:port1, host3:port3
user2 email2 host2:port2
上面使用GNU awk 4.*表示二维阵列。使用此awk:
awk -F '[, ]+' 'FNR==NR {a[$1]=$0; next}
$1 in a {
if (!seen[a[$1]])
seen[a[$1]] = $2;
else
seen[a[$1]] = seen[a[$1]] ", " $2
}
END { for (i in seen) print i, seen[i]}' list2 list1
user2 email2 host2:port2
user1 email1 host1:port1, host3:port3