分组依据和汇总-AWK脚本
我有数以百万计的数据,看起来像:分组依据和汇总-AWK脚本,awk,group-by,gawk,Awk,Group By,Gawk,我有数以百万计的数据,看起来像: TOTALOCTETSUNIT SERVEDACCOUNT SERVICECLASSID ACCUMULATEDUNITS ACCOUNTUNITSDEDUCTED ACCOUNTVALUEBEFORE ACCOUNTVALUEAFTER 850 66498336 70 10240 10240 0.083333
TOTALOCTETSUNIT SERVEDACCOUNT SERVICECLASSID ACCUMULATEDUNITS ACCOUNTUNITSDEDUCTED ACCOUNTVALUEBEFORE ACCOUNTVALUEAFTER
850 66498336 70 10240 10240 0.083333 0.083333
259 64625247 41 10240 10240 65.500000 65.50000
219792 76608974 35 225280 225280 653.049798 653.049798
15261 76900654 35 20480 20480 35.516666 35.516666
我必须按SERVEDACCOUNT然后按SERVICECLASSID进行分组,然后根据此分组的结果,我必须对之前的合计八进制单位、累计单位、ACCOUNTUNITSDEDUCTED和AccountValues进行汇总
如果求和仅基于一个字段,但我们必须使用两个字段进行分组,则不会有问题
下面是我正在使用的另存为test.awk的awk脚本
BEGIN { FS = "|" } NR > 2500 {exit}
1 < NR && NR <= 2500 {
#sub(/ .*/,"",$4)
key=$3
TOTOCTET[key]+=$1
ACCUNITS[key]+=$4
ACCUNITTED[key]+=$5
ACCVALBEF[key]+=$6} END {
printf "%-13s %18s %18s %18s %18s\n",
"SERVEDACCOUNT","TOTALOCTETSUNIT","ACCUMULATEDUNITS","ACCOUNTUNITSDEDUCTED","ACCOUNTVALUEBEFORE"
for (i in TOTOCTET) {
printf "%-4s %16.6f %16.6f %16.6f %16.6f\n",
i,TOTOCTET[i],ACCUNITS[i],ACCUNITTED[i],ACCVALBEF[i] }
}
以下是我想要的输出:
SERVEDACCOUNT SERVICECLASSID TOTALOCTETSUNIT ACCUMULATEDUNITS ACCOUNTUNITSDEDUCTED ACCOUNTVALUEBEFORE
64625247 41 259 10240 10240 65,5
66498336 70 850 10240 10240 0,083333
76608974 35 219792 225280 225280 653,049798
76900654 35 15261 20480 20480 35,516666
目前,密钥被设置为
$3
,但如果必须由SERVEDACCOUNT
和SERVICECLASSID
确定密钥,则密钥应基于$2
和$3
,例如:
BEGIN { FS = "\t" }
1 < NR && NR <= 2500 {
key=$2 "-" $3
TOTOCTET[key]+=$1
ACCUNITS[key]+=$4
ACCUNITTED[key]+=$5
ACCVALBEF[key]+=$6} END {
printf "%-13s %18s %18s %18s %18s %18s\n",
"SERVEDACCOUNT","SERVICECLASSID","TOTALOCTETSUNIT",
"ACCUMULATEDUNITS","ACCOUNTUNITSDEDUCTED","ACCOUNTVALUEBEFORE"
for (i in TOTOCTET) {
split(i,ii,/-/)
printf "%-16s %-16s %16.0f %16.0f %16.0f %16.0f\n",
ii[1],ii[2],TOTOCTET[i],ACCUNITS[i],ACCUNITTED[i],ACCVALBEF[i]
}
}
谢谢@Simon的回答!我执行了给定的代码,但仍然有与以前相同的输出(一组0)!知道问题出在哪里吗?我建议检查您正在使用的字段分隔符(
FS
)。您最初使用管道(|
),但我将其更改为tab,猜测这是实际的字段分隔符,并使用了一个以tab分隔的测试文件。如果字段之间有空格或空白,则需要适当设置FS
。因为标签或字段内容中没有空格,所以只使用空格是合适的。这是默认设置,因此您根本不必设置FS
。
BEGIN { FS = "\t" }
1 < NR && NR <= 2500 {
key=$2 "-" $3
TOTOCTET[key]+=$1
ACCUNITS[key]+=$4
ACCUNITTED[key]+=$5
ACCVALBEF[key]+=$6} END {
printf "%-13s %18s %18s %18s %18s %18s\n",
"SERVEDACCOUNT","SERVICECLASSID","TOTALOCTETSUNIT",
"ACCUMULATEDUNITS","ACCOUNTUNITSDEDUCTED","ACCOUNTVALUEBEFORE"
for (i in TOTOCTET) {
split(i,ii,/-/)
printf "%-16s %-16s %16.0f %16.0f %16.0f %16.0f\n",
ii[1],ii[2],TOTOCTET[i],ACCUNITS[i],ACCUNITTED[i],ACCVALBEF[i]
}
}
SERVEDACCOUNT SERVICECLASSID TOTALOCTETSUNIT ACCUMULATEDUNITS ACCOUNTUNITSDEDUCTED ACCOUNTVALUEBEFORE
76900654 35 15261 20480 20480 36
66498336 70 850 10240 10240 0
64625247 41 259 10240 10240 66
76608974 35 219792 225280 225280 653