Loops 使用awk获取一列中具有相同值的各行的所有值
我有一个数据集(Loops 使用awk获取一列中具有相同值的各行的所有值,loops,awk,Loops,Awk,我有一个数据集(test file.csv)和树状列: node,contact,mail AAAA,Peter,peter@anything.com BBBB,Hans,hans@anything.com CCCC,Dieter,dieter@anything.com ABABA,Peter,peter@anything.com CCDDA,Hans,hans@anything.com 我喜欢通过列count扩展标题,并将node重命名为nodes。 此外,所有条目都应在第二列(mail)之
test file.csv
)和树状列:
node,contact,mail
AAAA,Peter,peter@anything.com
BBBB,Hans,hans@anything.com
CCCC,Dieter,dieter@anything.com
ABABA,Peter,peter@anything.com
CCDDA,Hans,hans@anything.com
我喜欢通过列count
扩展标题,并将node
重命名为nodes
。
此外,所有条目都应在第二列(mail
)之后排序。
在count
列中,我想获得mail
列的出现次数,
在节点
中,应打印邮件
列中具有相同值的所有条目(空格分隔并按字母顺序排序)
这就是我努力实现的目标:
contact,mail,count,nodes
Dieter,dieter@anything,com,1,CCCC
Hans,hans@anything.com,2,BBBB CCDDA
Peter,peter@anything,com,2,AAAA ABABA
我有这个awk命令:
awk -F"," '
BEGIN{
FS=OFS=",";
printf "%s,%s,%s,%s\n", "contact","mail","count","nodes"
}
NR>1{
counts[$3]++; # Increment count of lines.
contact[$2]; # contact
}
END {
# Iterate over all third-column values.
for (x in counts) {
printf "%s,%s,%s,%s\n", contact[x],x,counts[x],"nodes"
}
}
' test-file.csv | sort --field-separator="," --key=2 -n
然而,这是我的结果:-(
只有发生的数量起作用
,Dieter@anything.com,1,nodes
,hans@anything.com,2,nodes
,peter@anything.com,2,nodes
contact,mail,count,nodes
感谢您的帮助!对于您展示的示例,请尝试以下内容。使用GNU
awk
编写并测试
awk '
BEGIN{ FS=OFS="," }
FNR==1{
sub(/^[^,]*,/,"")
$1=$1
print $0,"count,nodes"
}
FNR>1{
nf=$2
mail[nf]=$NF
NF--
arr[nf]++
val[nf]=(val[nf]?val[nf] " ":"")$1
}
END{
for(i in arr){
print i,mail[i],arr[i],val[i] | "sort -t, -k1"
}
}
' Input_file
说明:添加上述内容的详细说明
awk ' ##Starting awk program from here.
BEGIN{ FS=OFS="," } ##In BEGIN section setting FS, OFS as comma here.
FNR==1{ ##if this is first line then do following.
sub(/^[^,]*,/,"") ##Substituting everything till 1st comma here with NULL in current line.
$1=$1 ##Reassigning 1st field to itself.
print $0,"count,nodes" ##Printing headers as per need to terminal.
}
FNR>1{ ##If line is Greater than 1st line then do following.
nf=$2 ##Creating nf with 2nd field value here.
mail[nf]=$NF ##Creating mail with nf as index and value is last field value.
NF-- ##Decreasing value of current number of fields by 1 here.
arr[nf]++ ##Creating arr with index of nf and keep increasing its value with 1 here.
val[nf]=(val[nf]?val[nf] " ":"")$1 ##Creating val with index of nf and keep adding $1 value in it.
}
END{ ##Starting END block of this program from here.
for(i in arr){ ##Traversing through arr in here.
print i,mail[i],arr[i],val[i] | "sort -t, -k1" ##printing values to get expected output and sorting it also by pipe here as per requirement.
}
}
' Input_file ##Mentioning Input_file name here.
第二种解决方案:如果要按第二和第三个字段排序,请尝试以下操作
awk '
BEGIN{ FS=OFS="," }
FNR==1{
sub(/^[^,]*,/,"")
$1=$1
print $0,"count,nodes"
}
FNR>1{
nf=$2 OFS $3
NF--
arr[nf]++
val[nf]=(val[nf]?val[nf] " ":"")$1
}
END{
for(i in arr){
print i,arr[i],val[i] | "sort -t, -k1"
}
}
' Input_file
您可以使用此
gnu awk
:
awk'
开始{
FS=OFS=“,”
printf“%s,%s,%s,%s\n”、“联系人”、“邮件”、“计数”、“节点”
}
NR>1{
++计数[$3]#增加行数。
姓名[$3]=$2
地图[$3]=(地图中的$3?地图[$3]:“”)1
}
结束{
#迭代所有第三列值。
PROCINFO[“排序在”]=“@ind_str_asc”;
用于(以k计)
打印名称[k],k,计数[k],映射[k]
}
'test-file.csv
输出:
联系人、邮件、计数、节点
迪特,dieter@anything.com,1,中交
汉斯,hans@anything.com,2,BBBB,CCDDA
彼得,peter@anything.com,2,AAAA,ABABA
如何将PROCINFO[“sorted_in”]=“ind_str_asc”
更改为排序,例如按列count
进行排序?当然可以。只需在上面的代码中使用PROCINFO[“sorted_in”]=“val_num_desc”
。