Awk 时间序列数据求和

Awk 时间序列数据求和,awk,gnuplot,Awk,Gnuplot,我有基于时间的每分钟数据,并希望将其汇总为每小时(或其他时段,如周、月) 数据如下所示 timeStamp,kwH,watts "2016-07-16 16:18:51",0.014,710 "2016-07-16 16:20:01",0.013,669 "2016-07-16 16:22:40",0.020,720 ... "2016-07-16 21:06:01",0.006,360 "2016-07-16 21:07:00",0.006,366 "2016-07-16 21:08:01",

我有基于时间的每分钟数据,并希望将其汇总为每小时(或其他时段,如周、月)

数据如下所示

timeStamp,kwH,watts
"2016-07-16 16:18:51",0.014,710
"2016-07-16 16:20:01",0.013,669
"2016-07-16 16:22:40",0.020,720
...
"2016-07-16 21:06:01",0.006,360
"2016-07-16 21:07:00",0.006,366
"2016-07-16 21:08:01",0.007,413
"2016-07-16 21:09:01",0.006,360
我想对第二列(kwH)求和,第1列按小时分组

更大的数据集可从

我该怎么把这些加起来呢?我猜这可能涉及awk

其次,假设生成图形的数据、web服务和bash脚本都位于我控制的服务器上,对我来说,在mySQL中对这些数据求和比让gnuplot处理兆字节的原始数据更有效吗?

您的问题是如何消除乱七八糟的输入文件,这样我们就有了一些具体的东西来测试潜在的解决方案,并提供给定输入的预期输出。
$ cat > test.awk
{
  gsub(/^.* |:.*/,"",$1); # using regex remove all but the hour from the timestamp for "grouping by the hour" 
  arr[$1]+=$2             # sum together the "kwH"
} 
END {                     # after summing we print
  for (i in arr)          # for each element (hour) in the array
    print i,arr[i]}       # print the element and the sum of "kwH"
$ awk -f test.awk test.in
timeStamp 0
21 0.025
... 0
16 0.047