Awk Unix中的转置

Awk Unix中的转置,awk,sed,transpose,Awk,Sed,Transpose,我以这种方式在文件中记录了每小时的数据 2015-09-03 02:00:00 to 2015-09-03 02:59:59|ABC|673 2015-09-03 02:00:00 to 2015-09-03 02:59:59|AABC|52 2015-09-03 02:00:00 to 2015-09-03 02:59:59|ABCD|787 2015-09-03 02:00:00 to 2015-09-03 02:59:59|ADFGE|35 2015-09-03 02:00:00 to 2

我以这种方式在文件中记录了每小时的数据

2015-09-03 02:00:00 to 2015-09-03 02:59:59|ABC|673
2015-09-03 02:00:00 to 2015-09-03 02:59:59|AABC|52
2015-09-03 02:00:00 to 2015-09-03 02:59:59|ABCD|787
2015-09-03 02:00:00 to 2015-09-03 02:59:59|ADFGE|35
2015-09-03 02:00:00 to 2015-09-03 02:59:59|AGER|41
2015-09-03 02:00:00 to 2015-09-03 02:59:59|ETECFF|1384
2015-09-03 02:00:00 to 2015-09-03 02:59:59|TRIFD|38
2015-09-03 02:00:00 to 2015-09-03 02:59:59|CVGFFHG|166
2015-09-03 03:00:00 to 2015-09-03 03:59:59|FJREER|36
2015-09-03 03:00:00 to 2015-09-03 03:59:59|DFSD|31
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ASBF|38
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ABC|36
2015-09-03 03:00:00 to 2015-09-03 03:59:59|AABC|35
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ABCD|33
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ADFGE|39
2015-09-03 03:00:00 to 2015-09-03 03:59:59|AGER|33
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ETECFF|537
2015-09-03 03:00:00 to 2015-09-03 03:59:59|TRIFD|620635
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ABC|37
2015-09-03 03:00:00 to 2015-09-03 03:59:59|AABC|702
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ABCD|319
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ADFGE|33
2015-09-03 03:00:00 to 2015-09-03 03:59:59|AGER|306
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ETECFF|34
2015-09-03 03:00:00 to 2015-09-03 03:59:59|TRIFD|44
2015-09-03 03:00:00 to 2015-09-03 03:59:59|CVGFFHG|599
2015-09-03 03:00:00 to 2015-09-03 03:59:59|FJREER|30
2015-09-03 03:00:00 to 2015-09-03 03:59:59|DFSD|82
我想转换数据

1. Column 1 should go in as column header 
2. Column 2 should go in row header
3. Column 3 is data
4. Any absence of data should be represented as 0 (Zero)
下面是转置数据的样子

|2015-09-03 02:00:00 to 2015-09-03 02:59:59|2015-09-03 03:00:00 to 2015-09-03 03:59:59
AABC|52|737
ABC|0|73
ABCD|787|352
ADFGE|35|72
AGER|41|339
ASBF|0|38
CVGFFHG|166|599
DFSD|0|113
ETECFF|1384|571
FJREER|0|66
TRIFD|38|620679

我尝试过使用sed,但那不起作用。我在awk方面还不是很好,还没有达到高级水平,所以这里需要帮助

我认为在awk中,您可以创建一个索引类型为string的数组,也就是以列1为键的字典

该数组的每个元素都应该用另一个索引字符串数组填充:第2列作为键

然后处理每一行,必要时创建新的数组元素,并将第3列添加到值中

有关awk中语法的帮助:


请看第5节中的示例1,最终解决方案是多么容易。

这里有一个使用awk的解决方案。它在2D数组
值中保存
具有相同关键字
和相同标题列索引
i
的所有行。 在
末尾
为每个键和列打印所有这些。 数组
cols
用于检测标题列的更改。
hdrs
用于使标题保持正确的输出顺序。
keys
仅用于保存所有关键字的列表

awk -F'|' '
{ hdr = $1; key = $2; val = $3;
  if(cols[hdr]==0){
    cols[hdr] = ++column;
    hdrs[column] = hdr;
  }
  i = cols[hdr]
  keys[key] = 1
  values[i, key] += val
}
END{
  for(i = 1;i<=column;i++)
   printf  "|%s", hdrs[i]
  printf "\n"
  n = asorti(keys,sort)
  for(j = 1;j<=n;j++){
     key = sort[j]
     printf "%s",key
     for(i = 1;i<=column;i++)
      printf "|%s", values[i, key]+0
     printf "\n"
  }
}'
awk-F'|''
{hdr=$1;key=$2;val=$3;
if(cols[hdr]==0){
cols[hdr]=++列;
hdrs[列]=hdr;
}
i=cols[hdr]
键[键]=1
值[i,键]+=val
}
结束{
对于(i=1;i另一个awk

awk -F '|' '
  {
  Data[ $1, $2] = $3 + 1
  if( match( Headers, "(^\||)" $1 "(|\|$)" ) == 0 ) Headers = Headers $1 "|"
  if( match( Records, "(^\||)" $2 "(|\|$)" ) == 0 ) Records = Records $2 "|"
  }
END {
  cHeader = split( Headers, aHeader, "|" )
  cRecord = split( Records, aRecord, "|" )

  sub( /\|$/, "", Headers
  print "|" Headers

  for( iRecord = 1; iRecord <= cRecord; iRecord++) {
     printf "%s", aRecord[ 1]
     for( iHeader = 2; iHeader <= cHeader; iHeader++ ) {
        ThisData = Data[ aHeader[ iHeader], aRecord[ iRecord] ]
        printf "|%s", --ThisData
        }
     print
     }
  }
' YourFile
awk-F'|''
{
数据[$1,$2]=$3+1
如果(匹配(页眉“(^\\\\\$)”“$1”(^\\\\\\$)”)”==0)页眉=页眉$1“|”
如果(匹配(记录),(^\\\\\\$)“$2”(^\\\\\\$)”)==0)记录=记录$2“|”
}
结束{
cHeader=split(标题,标题,“|”)
cRecord=split(记录,aRecord,“|”)
子(/\ |$/,“”),标题
打印“|”标题

对于(iRecord=1;iRecord数字从何而来?似乎并非所有数字都与输入的数字相匹配。此外,数据透视和似乎是必需的。这是正确的,数据透视后需要求和。ABC | 673、ABC | 36和ABC | 37应为:ABC | 0 | 731)ABC | 673、ABC | 36和ABC | 37应该变成:ABC | 0 | 73,但AABC | 52、AABC | 35和AABC | 702应该变成:AABC | 52 | 737。你能解释一下为什么在你写的关于缺少数据的第四条断言中这不是ABC | 673 | 73.2)。你的数据中的哪一行与此有关?