在CSV文件中,使用KSH中的AWK,根据第三列小计2列

在CSV文件中,使用KSH中的AWK,根据第三列小计2列,awk,ksh,Awk,Ksh,免责声明: 1) English is my second language, so please forgive any grammatical horrors you may find. I am pretty confident you will be able to understand what I need despite these. 2) I have found several examples in this site that address questio

免责声明:

    1) English is my second language, so please forgive any grammatical horrors you may find. I am pretty confident you will be able to understand what I need despite these.
    2) I have found several examples in this site that address questions/problems similar to mine, though I was unfortunately not able to figure out the modifications that would need to be introduced to fit my needs.
    1) File has 11 columns.
    2) The data in columns 1, 2, 3, 4, 5, 8, 9 and 11 is irrelevant in this case. In other words, I will only work with columns 6, 7 and 10.
    3) Column 10 will be typically alphanumeric strings (server names), though it may contain also "-" and/or "_".
    4) Columns 6 and 7 will have exclusively numbers, with up to two decimal places (A possible value is 0). Only one of the two will have data per line, never both.
    - A single occurrence of every string in column 10 (as column 1), then the sum (subtotal) of it's values in column 6 (as column 2) and last, the sum (subtotal) of it's values in column 7 (as column 3).
    - If the total for a field is "0" the field must be left empty, but still must exist (it's respective comma has to be printed).
    - **Note** that the strings in column 10 will be already alphabetically sorted, so there is no need to do that part of the processing with AWK.
SERVER1,134.6,,
SERVER2,0.18,,
SERVER3,428.19,,
SERVER4,480.64,,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,26.96
    - Subtotals of 0 are printed.
    - I can only handle the sum of one column at a time. Whenever I try to add the second one, I get either a syntax error or it does simply not print the third column at all.
问题:

    1) English is my second language, so please forgive any grammatical horrors you may find. I am pretty confident you will be able to understand what I need despite these.
    2) I have found several examples in this site that address questions/problems similar to mine, though I was unfortunately not able to figure out the modifications that would need to be introduced to fit my needs.
    1) File has 11 columns.
    2) The data in columns 1, 2, 3, 4, 5, 8, 9 and 11 is irrelevant in this case. In other words, I will only work with columns 6, 7 and 10.
    3) Column 10 will be typically alphanumeric strings (server names), though it may contain also "-" and/or "_".
    4) Columns 6 and 7 will have exclusively numbers, with up to two decimal places (A possible value is 0). Only one of the two will have data per line, never both.
    - A single occurrence of every string in column 10 (as column 1), then the sum (subtotal) of it's values in column 6 (as column 2) and last, the sum (subtotal) of it's values in column 7 (as column 3).
    - If the total for a field is "0" the field must be left empty, but still must exist (it's respective comma has to be printed).
    - **Note** that the strings in column 10 will be already alphabetically sorted, so there is no need to do that part of the processing with AWK.
SERVER1,134.6,,
SERVER2,0.18,,
SERVER3,428.19,,
SERVER4,480.64,,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,26.96
    - Subtotals of 0 are printed.
    - I can only handle the sum of one column at a time. Whenever I try to add the second one, I get either a syntax error or it does simply not print the third column at all.
我有一个CSV文件,如下所示:

c1,c2,c3,c4,c5,134.6,,c8,c9,SERVER1,c11
c1,c2,c3,c4,c5,0,,c8,c9,SERVER1,c11
c1,c2,c3,c4,c5,0.18,,c8,c9,SERVER2,c11
c1,c2,c3,c4,c5,0,,c8,c9,SERVER2,c11
c1,c2,c3,c4,c5,416.09,,c8,c9,SERVER3,c11
c1,c2,c3,c4,c5,0,,c8,c9,SERVER3,c11
c1,c2,c3,c4,c5,12.1,,c8,c9,SERVER3,c11
c1,c2,c3,c4,c5,480.64,,c8,c9,SERVER4,c11
c1,c2,c3,c4,c5,,83.65,c8,c9,SERVER5,c11
c1,c2,c3,c4,c5,,253.15,c8,c9,SERVER6,c11
c1,c2,c3,c4,c5,,18.84,c8,c9,SERVER7,c11
c1,c2,c3,c4,c5,,8.12,c8,c9,SERVER7,c11
c1,c2,c3,c4,c5,,22.45,c8,c9,SERVER7,c11
c1,c2,c3,c4,c5,,117.81,c8,c9,SERVER8,c11
c1,c2,c3,c4,c5,,96.34,c8,c9,SERVER9,c11
补充事实:

    1) English is my second language, so please forgive any grammatical horrors you may find. I am pretty confident you will be able to understand what I need despite these.
    2) I have found several examples in this site that address questions/problems similar to mine, though I was unfortunately not able to figure out the modifications that would need to be introduced to fit my needs.
    1) File has 11 columns.
    2) The data in columns 1, 2, 3, 4, 5, 8, 9 and 11 is irrelevant in this case. In other words, I will only work with columns 6, 7 and 10.
    3) Column 10 will be typically alphanumeric strings (server names), though it may contain also "-" and/or "_".
    4) Columns 6 and 7 will have exclusively numbers, with up to two decimal places (A possible value is 0). Only one of the two will have data per line, never both.
    - A single occurrence of every string in column 10 (as column 1), then the sum (subtotal) of it's values in column 6 (as column 2) and last, the sum (subtotal) of it's values in column 7 (as column 3).
    - If the total for a field is "0" the field must be left empty, but still must exist (it's respective comma has to be printed).
    - **Note** that the strings in column 10 will be already alphabetically sorted, so there is no need to do that part of the processing with AWK.
SERVER1,134.6,,
SERVER2,0.18,,
SERVER3,428.19,,
SERVER4,480.64,,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,26.96
    - Subtotals of 0 are printed.
    - I can only handle the sum of one column at a time. Whenever I try to add the second one, I get either a syntax error or it does simply not print the third column at all.
我需要什么作为输出:

    1) English is my second language, so please forgive any grammatical horrors you may find. I am pretty confident you will be able to understand what I need despite these.
    2) I have found several examples in this site that address questions/problems similar to mine, though I was unfortunately not able to figure out the modifications that would need to be introduced to fit my needs.
    1) File has 11 columns.
    2) The data in columns 1, 2, 3, 4, 5, 8, 9 and 11 is irrelevant in this case. In other words, I will only work with columns 6, 7 and 10.
    3) Column 10 will be typically alphanumeric strings (server names), though it may contain also "-" and/or "_".
    4) Columns 6 and 7 will have exclusively numbers, with up to two decimal places (A possible value is 0). Only one of the two will have data per line, never both.
    - A single occurrence of every string in column 10 (as column 1), then the sum (subtotal) of it's values in column 6 (as column 2) and last, the sum (subtotal) of it's values in column 7 (as column 3).
    - If the total for a field is "0" the field must be left empty, but still must exist (it's respective comma has to be printed).
    - **Note** that the strings in column 10 will be already alphabetically sorted, so there is no need to do that part of the processing with AWK.
SERVER1,134.6,,
SERVER2,0.18,,
SERVER3,428.19,,
SERVER4,480.64,,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,26.96
    - Subtotals of 0 are printed.
    - I can only handle the sum of one column at a time. Whenever I try to add the second one, I get either a syntax error or it does simply not print the third column at all.
输出样本,使用上面的样本作为输入:

    1) English is my second language, so please forgive any grammatical horrors you may find. I am pretty confident you will be able to understand what I need despite these.
    2) I have found several examples in this site that address questions/problems similar to mine, though I was unfortunately not able to figure out the modifications that would need to be introduced to fit my needs.
    1) File has 11 columns.
    2) The data in columns 1, 2, 3, 4, 5, 8, 9 and 11 is irrelevant in this case. In other words, I will only work with columns 6, 7 and 10.
    3) Column 10 will be typically alphanumeric strings (server names), though it may contain also "-" and/or "_".
    4) Columns 6 and 7 will have exclusively numbers, with up to two decimal places (A possible value is 0). Only one of the two will have data per line, never both.
    - A single occurrence of every string in column 10 (as column 1), then the sum (subtotal) of it's values in column 6 (as column 2) and last, the sum (subtotal) of it's values in column 7 (as column 3).
    - If the total for a field is "0" the field must be left empty, but still must exist (it's respective comma has to be printed).
    - **Note** that the strings in column 10 will be already alphabetically sorted, so there is no need to do that part of the processing with AWK.
SERVER1,134.6,,
SERVER2,0.18,,
SERVER3,428.19,,
SERVER4,480.64,,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,26.96
    - Subtotals of 0 are printed.
    - I can only handle the sum of one column at a time. Whenever I try to add the second one, I get either a syntax error or it does simply not print the third column at all.
我已经在这些页面中找到了不是一个而是两个AWK OneLiner,它们部分实现了它所需要的功能:

awk -F "," 'NR==1{last=$10; sum=0;}{if (last != $10) {print last "," sum; last=$10; sum=0;} sum += $6;}END{print last "," sum;}' inputfile


awk -F, '{a[$10]+=$6;}END{for(i in a)print i","a[i];}' inputfile
我在这两种情况下的“问题”是相同的:

    1) English is my second language, so please forgive any grammatical horrors you may find. I am pretty confident you will be able to understand what I need despite these.
    2) I have found several examples in this site that address questions/problems similar to mine, though I was unfortunately not able to figure out the modifications that would need to be introduced to fit my needs.
    1) File has 11 columns.
    2) The data in columns 1, 2, 3, 4, 5, 8, 9 and 11 is irrelevant in this case. In other words, I will only work with columns 6, 7 and 10.
    3) Column 10 will be typically alphanumeric strings (server names), though it may contain also "-" and/or "_".
    4) Columns 6 and 7 will have exclusively numbers, with up to two decimal places (A possible value is 0). Only one of the two will have data per line, never both.
    - A single occurrence of every string in column 10 (as column 1), then the sum (subtotal) of it's values in column 6 (as column 2) and last, the sum (subtotal) of it's values in column 7 (as column 3).
    - If the total for a field is "0" the field must be left empty, but still must exist (it's respective comma has to be printed).
    - **Note** that the strings in column 10 will be already alphabetically sorted, so there is no need to do that part of the processing with AWK.
SERVER1,134.6,,
SERVER2,0.18,,
SERVER3,428.19,,
SERVER4,480.64,,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,26.96
    - Subtotals of 0 are printed.
    - I can only handle the sum of one column at a time. Whenever I try to add the second one, I get either a syntax error or it does simply not print the third column at all.
提前感谢您的支持! 当做 马汀是这样的吗

$ awk 'BEGIN{FS=OFS=","} 
            {s6[$10]+=$6; s7[$10]+=$7} 
         END{for(k in s6) print k,(s6[k]?s6[k]:""),(s7[k]?s7[k]:"")}' file | sort

SERVER1,134.6,
SERVER2,0.18,
SERVER3,428.19,
SERVER4,480.64,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,49.41
SERVER8,,117.81
SERVER9,,96.34
请注意,您对逗号的处理不一致,您在最后一个字段为零时添加了一个额外的逗号(计算逗号)

类似的内容

$ awk 'BEGIN{FS=OFS=","} 
            {s6[$10]+=$6; s7[$10]+=$7} 
         END{for(k in s6) print k,(s6[k]?s6[k]:""),(s7[k]?s7[k]:"")}' file | sort

SERVER1,134.6,
SERVER2,0.18,
SERVER3,428.19,
SERVER4,480.64,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,49.41
SERVER8,,117.81
SERVER9,,96.34

请注意,您对逗号的处理不一致,您在最后一个字段为零时添加了一个额外的逗号(计算逗号)

您发布的预期输出似乎与您发布的示例输入不匹配,因此我们在猜测,但这可能正是您想要的:

$ cat tst.awk
BEGIN { FS=OFS="," }
$10 != prev {
    if (NR > 1) {
        print prev, sum6, sum7
    }
    sum6 = sum7 = ""
    prev = $10
}
$6  { sum6 += $6 }
$7  { sum7 += $7 }
END { print prev, sum6, sum7 }

$ awk -f tst.awk file
SERVER1,134.6,
SERVER2,0.18,
SERVER3,428.19,
SERVER4,480.64,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,49.41
SERVER8,,117.81
SERVER9,,96.34

您发布的预期输出似乎与您发布的示例输入不匹配,因此我们正在猜测,但这可能是您正在寻找的:

$ cat tst.awk
BEGIN { FS=OFS="," }
$10 != prev {
    if (NR > 1) {
        print prev, sum6, sum7
    }
    sum6 = sum7 = ""
    prev = $10
}
$6  { sum6 += $6 }
$7  { sum7 += $7 }
END { print prev, sum6, sum7 }

$ awk -f tst.awk file
SERVER1,134.6,
SERVER2,0.18,
SERVER3,428.19,
SERVER4,480.64,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,49.41
SERVER8,,117.81
SERVER9,,96.34

卡拉夫卡:关于不一致的逗号,你是对的。请注意,由于我的代码无法工作,我必须手动将所需的输出组合在一起,因此出现了错误。顺便说一句,你的代码工作起来像个傻瓜cat input.csv | awk'BEGIN{FS=OFS=“,”};{s6[$10]+=$6;s7[$10]+=$7};结束{对于(s6中的k)打印k,(s6[k]?s6[k]:“”),(s7[k]?s7[k]:“”)服务器1134.6、服务器2、0.18、服务器3428.19、服务器4480.64、服务器5、83.65服务器6、253.15服务器7、49.41服务器8、117.81服务器9、96.34非常感谢。我已经把你的答案标为正确答案了!不客气。请注意,您不需要预先使用
cat
awk
也可以打开文件。你不需要选择第一个答案,而是选择你认为是解决问题最好的答案。从长远来看,这是一个更好的方法。这种方法的优点是不需要排序输入。缺点是需要在之后进行排序。如果您没有可排序键,并且希望维护输入文件中的顺序,那么其他解决方案可能更好。Karafka:关于不一致的逗号,您是对的。请注意,由于我的代码无法工作,我必须手动将所需的输出组合在一起,因此出现了错误。顺便说一句,你的代码工作起来像个傻瓜cat input.csv | awk'BEGIN{FS=OFS=“,”};{s6[$10]+=$6;s7[$10]+=$7};结束{对于(s6中的k)打印k,(s6[k]?s6[k]:“”),(s7[k]?s7[k]:“”)服务器1134.6、服务器2、0.18、服务器3428.19、服务器4480.64、服务器5、83.65服务器6、253.15服务器7、49.41服务器8、117.81服务器9、96.34非常感谢。我已经把你的答案标为正确答案了!不客气。请注意,您不需要预先使用
cat
awk
也可以打开文件。你不需要选择第一个答案,而是选择你认为是解决问题最好的答案。从长远来看,这是一个更好的方法。这种方法的优点是不需要排序输入。缺点是需要在之后进行排序。如果您没有可排序键,并且希望维护输入文件中的顺序,那么其他解决方案可能更好。埃德:关于输出与输入不匹配的问题,您是对的。请注意,由于我的代码无法工作,我必须手动将所需的输出组合在一起,因此出现了错误。顺便说一句,你的代码工作起来像个傻瓜!太:cat input.csv | awk'BEGIN{FS=OFS=“,”$10 != prev{if(NR>1){print prev,sum6,sum7};sum6=sum7=“”;prev=10}$6{sum6+=$6}$7{sum7+=7美元};结束{print prev,sum6,sum7}服务器1134.6,服务器2,0.18,服务器3428.19,服务器4480.64,服务器5,服务器83.65,服务器6,服务器253.15,服务器7,服务器49.41,服务器8,服务器117.81,服务器9,服务器96.34非常感谢您!我把卡拉克法(也是有效的)的回答标记为正确的,因为自从他早些时候回答以来,这似乎是最公平的做法。我也想标记你的回复,但这样做似乎取消了我对卡拉夫卡回复的选择。如果有一种方法,我可以标记这两个,请让我知道,我一定会这样做!再次感谢您抽出时间解决此问题;)没问题。2之间的主要区别是,我的将始终按照服务器在输入中出现的顺序打印输出,而karakfas将根据服务器名称的字母顺序重新排列行。在输入的末尾添加一行SERVER10,然后尝试这两种解决方案以了解我的意思(SERVER10将出现在SERVER9之后,但在SERVER1和SERVER2之间,带有karakfas)。另外,如果你的文件非常大,那么你会想使用我的文件,因为我的文件只在内存中存储2个和整数,而karakfas会根据两个和的所有服务器名称对数组进行索引。埃德:关于输出与输入不匹配的问题,你是对的。请注意,由于我的代码无法工作,我必须手动将所需的输出组合在一起,因此出现了错误。顺便说一句,你的代码工作起来像个傻瓜!太:cat input.csv | awk'BEGIN{FS=OFS=“,”$10 != prev{if(NR>1){print prev,sum6,sum7};sum6=sum7=“”;prev=10}$6{sum6+=$6}$7{sum7+=7美元};结束{print prev,sum6,sum7}服务器1134.6,服务器2,0.18,服务器3428.19,服务器4480.64,服务器5,服务器83.65,服务器6,服务器253.15,服务器7,服务器49.41,服务器8,服务器117.81,服务器9,服务器96.34非常感谢您!我把卡拉克法(也是有效的)的回答标记为正确的,因为自从他早些时候回答以来,这似乎是最公平的做法。我也想标记你的回复,但这样做似乎取消了我对卡拉夫卡回复的选择。如果有一种方法,我可以标记这两个,请让我知道,我一定会这样做!再次感谢您抽出时间解决此问题;)没问题。2之间的主要区别是,我的将始终按照服务器在输入中出现的顺序打印输出,而karakfas将根据服务器名称的字母顺序重新排列行。添加服务器10