Loops 使用awk获取一列中具有相同值的各行的所有值

Loops 使用awk获取一列中具有相同值的各行的所有值,loops,awk,Loops,Awk,我有一个数据集(test file.csv)和树状列: node,contact,mail AAAA,Peter,peter@anything.com BBBB,Hans,hans@anything.com CCCC,Dieter,dieter@anything.com ABABA,Peter,peter@anything.com CCDDA,Hans,hans@anything.com 我喜欢通过列count扩展标题,并将node重命名为nodes。 此外,所有条目都应在第二列(mail)之

我有一个数据集(
test file.csv
)和树状列:

node,contact,mail
AAAA,Peter,peter@anything.com
BBBB,Hans,hans@anything.com
CCCC,Dieter,dieter@anything.com
ABABA,Peter,peter@anything.com
CCDDA,Hans,hans@anything.com
我喜欢通过列
count
扩展标题,并将
node
重命名为
nodes
。 此外,所有条目都应在第二列(
mail
)之后排序。 在
count
列中,我想获得
mail
列的出现次数, 在
节点
中,应打印
邮件
列中具有相同值的所有条目(空格分隔并按字母顺序排序)

这就是我努力实现的目标:

contact,mail,count,nodes
Dieter,dieter@anything,com,1,CCCC
Hans,hans@anything.com,2,BBBB CCDDA
Peter,peter@anything,com,2,AAAA ABABA
我有这个awk命令:

awk -F"," '
BEGIN{
  FS=OFS=",";
printf "%s,%s,%s,%s\n", "contact","mail","count","nodes"
}
NR>1{
    counts[$3]++;     # Increment count of lines.
    contact[$2];      # contact
}
END {
    # Iterate over all third-column values.
    for (x in counts) {
    printf "%s,%s,%s,%s\n", contact[x],x,counts[x],"nodes"
    }
}
' test-file.csv | sort --field-separator="," --key=2 -n
然而,这是我的结果:-( 只有发生的数量起作用

,Dieter@anything.com,1,nodes
,hans@anything.com,2,nodes
,peter@anything.com,2,nodes
contact,mail,count,nodes

感谢您的帮助!

对于您展示的示例,请尝试以下内容。使用GNU
awk
编写并测试

awk '
BEGIN{ FS=OFS="," }
FNR==1{
  sub(/^[^,]*,/,"")
  $1=$1
  print $0,"count,nodes"
}
FNR>1{
  nf=$2
  mail[nf]=$NF
  NF--
  arr[nf]++
  val[nf]=(val[nf]?val[nf] " ":"")$1
}
END{
  for(i in arr){
    print i,mail[i],arr[i],val[i] | "sort -t, -k1"
  }
}
' Input_file
说明:添加上述内容的详细说明

awk '                                 ##Starting awk program from here.
BEGIN{ FS=OFS="," }                   ##In BEGIN section setting FS, OFS as comma here.   
FNR==1{                               ##if this is first line then do following.
  sub(/^[^,]*,/,"")                   ##Substituting everything till 1st comma here with NULL in current line.
  $1=$1                               ##Reassigning 1st field to itself.
  print $0,"count,nodes"            ##Printing headers as per need to terminal.
}
FNR>1{                                ##If line is Greater than 1st line then do following.
  nf=$2                               ##Creating nf with 2nd field value here.
  mail[nf]=$NF                        ##Creating mail with nf as index and value is last field value.
  NF--                                ##Decreasing value of current number of fields by 1 here.
  arr[nf]++                           ##Creating arr with index of nf and keep increasing its value with 1 here.
  val[nf]=(val[nf]?val[nf] " ":"")$1  ##Creating val with index of nf and keep adding $1 value in it.
}
END{                                  ##Starting END block of this program from here.
  for(i in arr){                      ##Traversing through arr in here.
    print i,mail[i],arr[i],val[i] | "sort -t, -k1"  ##printing values to get expected output and sorting it also by pipe here as per requirement.
  }
}
' Input_file                          ##Mentioning Input_file name here.


第二种解决方案:如果要按第二和第三个字段排序,请尝试以下操作

awk '
BEGIN{ FS=OFS="," }
FNR==1{
  sub(/^[^,]*,/,"")
  $1=$1
    print $0,"count,nodes"
}
FNR>1{
  nf=$2 OFS $3
  NF--
  arr[nf]++
  val[nf]=(val[nf]?val[nf] " ":"")$1
}
END{
  for(i in arr){
    print i,arr[i],val[i] | "sort -t, -k1"
  }
}
'  Input_file

您可以使用此
gnu awk

awk'
开始{
FS=OFS=“,”
printf“%s,%s,%s,%s\n”、“联系人”、“邮件”、“计数”、“节点”
}
NR>1{
++计数[$3]#增加行数。
姓名[$3]=$2
地图[$3]=(地图中的$3?地图[$3]:“”)1
}
结束{
#迭代所有第三列值。
PROCINFO[“排序在”]=“@ind_str_asc”;
用于(以k计)
打印名称[k],k,计数[k],映射[k]
}
'test-file.csv
输出:

联系人、邮件、计数、节点
迪特,dieter@anything.com,1,中交
汉斯,hans@anything.com,2,BBBB,CCDDA
彼得,peter@anything.com,2,AAAA,ABABA

如何将
PROCINFO[“sorted_in”]=“ind_str_asc”
更改为排序,例如按列
count
进行排序?当然可以。只需在上面的代码中使用
PROCINFO[“sorted_in”]=“val_num_desc”