分组依据和汇总-AWK脚本

分组依据和汇总-AWK脚本,awk,group-by,gawk,Awk,Group By,Gawk,我有数以百万计的数据,看起来像: TOTALOCTETSUNIT SERVEDACCOUNT SERVICECLASSID ACCUMULATEDUNITS ACCOUNTUNITSDEDUCTED ACCOUNTVALUEBEFORE ACCOUNTVALUEAFTER 850 66498336 70 10240 10240 0.083333

我有数以百万计的数据,看起来像:

TOTALOCTETSUNIT SERVEDACCOUNT   SERVICECLASSID  ACCUMULATEDUNITS    ACCOUNTUNITSDEDUCTED    ACCOUNTVALUEBEFORE  ACCOUNTVALUEAFTER
850             66498336         70             10240                10240                   0.083333           0.083333
259             64625247         41             10240                10240                   65.500000          65.50000
219792          76608974         35             225280               225280                  653.049798         653.049798
15261           76900654         35             20480                20480                   35.516666          35.516666
我必须按SERVEDACCOUNT然后按SERVICECLASSID进行分组,然后根据此分组的结果,我必须对之前的合计八进制单位、累计单位、ACCOUNTUNITSDEDUCTED和AccountValues进行汇总 如果求和仅基于一个字段,但我们必须使用两个字段进行分组,则不会有问题

下面是我正在使用的另存为test.awk的awk脚本

BEGIN { FS = "|" } NR > 2500 {exit}            
1 < NR && NR <= 2500 { 
#sub(/ .*/,"",$4)      
key=$3
TOTOCTET[key]+=$1
ACCUNITS[key]+=$4
ACCUNITTED[key]+=$5
ACCVALBEF[key]+=$6} END {
printf "%-13s %18s %18s %18s %18s\n", 
    "SERVEDACCOUNT","TOTALOCTETSUNIT","ACCUMULATEDUNITS","ACCOUNTUNITSDEDUCTED","ACCOUNTVALUEBEFORE" 
for (i in TOTOCTET) { 
    printf "%-4s %16.6f %16.6f %16.6f %16.6f\n", 
        i,TOTOCTET[i],ACCUNITS[i],ACCUNITTED[i],ACCVALBEF[i] }
}
以下是我想要的输出:

SERVEDACCOUNT   SERVICECLASSID  TOTALOCTETSUNIT ACCUMULATEDUNITS    ACCOUNTUNITSDEDUCTED    ACCOUNTVALUEBEFORE
64625247         41               259           10240                  10240                  65,5
66498336         70               850           10240                  10240                 0,083333
76608974         35               219792        225280                225280                  653,049798
76900654         35               15261          20480                 20480                   35,516666

目前,密钥被设置为
$3
,但如果必须由
SERVEDACCOUNT
SERVICECLASSID
确定密钥,则密钥应基于
$2
$3
,例如:

BEGIN { FS = "\t" } 
1 < NR && NR <= 2500 {
    key=$2 "-" $3
    TOTOCTET[key]+=$1
    ACCUNITS[key]+=$4
    ACCUNITTED[key]+=$5
    ACCVALBEF[key]+=$6} END {
    printf "%-13s %18s %18s %18s %18s %18s\n", 
    "SERVEDACCOUNT","SERVICECLASSID","TOTALOCTETSUNIT",
    "ACCUMULATEDUNITS","ACCOUNTUNITSDEDUCTED","ACCOUNTVALUEBEFORE" 
    for (i in TOTOCTET) {
        split(i,ii,/-/)
        printf "%-16s %-16s %16.0f %16.0f %16.0f %16.0f\n", 
        ii[1],ii[2],TOTOCTET[i],ACCUNITS[i],ACCUNITTED[i],ACCVALBEF[i] 
        }
    }

谢谢@Simon的回答!我执行了给定的代码,但仍然有与以前相同的输出(一组0)!知道问题出在哪里吗?我建议检查您正在使用的字段分隔符(
FS
)。您最初使用管道(
|
),但我将其更改为tab,猜测这是实际的字段分隔符,并使用了一个以tab分隔的测试文件。如果字段之间有空格或空白,则需要适当设置
FS
。因为标签或字段内容中没有空格,所以只使用空格是合适的。这是默认设置,因此您根本不必设置
FS
BEGIN { FS = "\t" } 
1 < NR && NR <= 2500 {
    key=$2 "-" $3
    TOTOCTET[key]+=$1
    ACCUNITS[key]+=$4
    ACCUNITTED[key]+=$5
    ACCVALBEF[key]+=$6} END {
    printf "%-13s %18s %18s %18s %18s %18s\n", 
    "SERVEDACCOUNT","SERVICECLASSID","TOTALOCTETSUNIT",
    "ACCUMULATEDUNITS","ACCOUNTUNITSDEDUCTED","ACCOUNTVALUEBEFORE" 
    for (i in TOTOCTET) {
        split(i,ii,/-/)
        printf "%-16s %-16s %16.0f %16.0f %16.0f %16.0f\n", 
        ii[1],ii[2],TOTOCTET[i],ACCUNITS[i],ACCUNITTED[i],ACCVALBEF[i] 
        }
    }
SERVEDACCOUNT     SERVICECLASSID    TOTALOCTETSUNIT   ACCUMULATEDUNITS ACCOUNTUNITSDEDUCTED ACCOUNTVALUEBEFORE
76900654         35                          15261            20480            20480               36
66498336         70                            850            10240            10240                0
64625247         41                            259            10240            10240               66
76608974         35                         219792           225280           225280              653