Loops 使用awk获取一列中具有相同值的各行的所有值_Loops_Awk

Loops 使用awk获取一列中具有相同值的各行的所有值

loops awk

Loops 使用awk获取一列中具有相同值的各行的所有值,loops,awk,Loops,Awk,我有一个数据集（test file.csv）和树状列： node,contact,mail AAAA,Peter,peter@anything.com BBBB,Hans,hans@anything.com CCCC,Dieter,dieter@anything.com ABABA,Peter,peter@anything.com CCDDA,Hans,hans@anything.com 我喜欢通过列count扩展标题，并将node重命名为nodes。此外，所有条目都应在第二列（mail）之

我有一个数据集（

test file.csv

）和树状列：

node,contact,mail
AAAA,Peter,peter@anything.com
BBBB,Hans,hans@anything.com
CCCC,Dieter,dieter@anything.com
ABABA,Peter,peter@anything.com
CCDDA,Hans,hans@anything.com

我喜欢通过列

count

扩展标题，并将

node

重命名为

nodes

。此外，所有条目都应在第二列（

mail

）之后排序。在

count

列中，我想获得

mail

列的出现次数，在

节点

中，应打印

邮件

列中具有相同值的所有条目（空格分隔并按字母顺序排序）

这就是我努力实现的目标：

contact,mail,count,nodes
Dieter,dieter@anything,com,1,CCCC
Hans,hans@anything.com,2,BBBB CCDDA
Peter,peter@anything,com,2,AAAA ABABA

我有这个awk命令：

awk -F"," '
BEGIN{
  FS=OFS=",";
printf "%s,%s,%s,%s\n", "contact","mail","count","nodes"
}
NR>1{
    counts[$3]++;     # Increment count of lines.
    contact[$2];      # contact
}
END {
    # Iterate over all third-column values.
    for (x in counts) {
    printf "%s,%s,%s,%s\n", contact[x],x,counts[x],"nodes"
    }
}
' test-file.csv | sort --field-separator="," --key=2 -n

然而，这是我的结果：-( 只有发生的数量起作用

,Dieter@anything.com,1,nodes
,hans@anything.com,2,nodes
,peter@anything.com,2,nodes
contact,mail,count,nodes

感谢您的帮助！

对于您展示的示例，请尝试以下内容。使用GNU

awk

编写并测试

awk '
BEGIN{ FS=OFS="," }
FNR==1{
  sub(/^[^,]*,/,"")
  $1=$1
  print $0,"count,nodes"
}
FNR>1{
  nf=$2
  mail[nf]=$NF
  NF--
  arr[nf]++
  val[nf]=(val[nf]?val[nf] " ":"")$1
}
END{
  for(i in arr){
    print i,mail[i],arr[i],val[i] | "sort -t, -k1"
  }
}
' Input_file

说明：添加上述内容的详细说明

awk '                                 ##Starting awk program from here.
BEGIN{ FS=OFS="," }                   ##In BEGIN section setting FS, OFS as comma here.   
FNR==1{                               ##if this is first line then do following.
  sub(/^[^,]*,/,"")                   ##Substituting everything till 1st comma here with NULL in current line.
  $1=$1                               ##Reassigning 1st field to itself.
  print $0,"count,nodes"            ##Printing headers as per need to terminal.
}
FNR>1{                                ##If line is Greater than 1st line then do following.
  nf=$2                               ##Creating nf with 2nd field value here.
  mail[nf]=$NF                        ##Creating mail with nf as index and value is last field value.
  NF--                                ##Decreasing value of current number of fields by 1 here.
  arr[nf]++                           ##Creating arr with index of nf and keep increasing its value with 1 here.
  val[nf]=(val[nf]?val[nf] " ":"")$1  ##Creating val with index of nf and keep adding $1 value in it.
}
END{                                  ##Starting END block of this program from here.
  for(i in arr){                      ##Traversing through arr in here.
    print i,mail[i],arr[i],val[i] | "sort -t, -k1"  ##printing values to get expected output and sorting it also by pipe here as per requirement.
  }
}
' Input_file                          ##Mentioning Input_file name here.

第二种解决方案：如果要按第二和第三个字段排序，请尝试以下操作

awk '
BEGIN{ FS=OFS="," }
FNR==1{
  sub(/^[^,]*,/,"")
  $1=$1
    print $0,"count,nodes"
}
FNR>1{
  nf=$2 OFS $3
  NF--
  arr[nf]++
  val[nf]=(val[nf]?val[nf] " ":"")$1
}
END{
  for(i in arr){
    print i,arr[i],val[i] | "sort -t, -k1"
  }
}
'  Input_file

您可以使用此

gnu awk

：

awk'
开始{
FS=OFS=“，”
printf“%s，%s，%s，%s\n”、“联系人”、“邮件”、“计数”、“节点”
}
NR>1{
++计数[$3]#增加行数。
姓名[$3]=$2
地图[$3]=（地图中的$3？地图[$3]：“”）1
}
结束{
#迭代所有第三列值。
PROCINFO[“排序在”]=“@ind_str_asc”；
用于（以k计）
打印名称[k]，k，计数[k]，映射[k]
}
'test-file.csv

输出：

联系人、邮件、计数、节点
迪特，dieter@anything.com，1，中交
汉斯，hans@anything.com，2，BBBB，CCDDA
彼得，peter@anything.com，2，AAAA，ABABA

如何将

PROCINFO[“sorted_in”]=“ind_str_asc”

更改为排序，例如按列

count

进行排序？当然可以。只需在上面的代码中使用

PROCINFO[“sorted_in”]=“val_num_desc”

。