Unix 要计算每周(7天)的移动/回滚总和:

Unix 要计算每周(7天)的移动/回滚总和:,unix,awk,Unix,Awk,请帮助根据分销商方面的$2和滚动日期计算每周移动/回滚金额$4 想设定一个虚拟的样子吗 RollingStartDate ==01/05/2015 and RollingInterval==7 and RollingEndDate ==08/05/2015 例如: 1st May 2015 Rolling 7 Days data set would be from 01/05/2015 to 25/04/2015 2nd May 2015 Rolling 7 Days data set wo

请帮助根据分销商方面的$2和滚动日期计算每周移动/回滚金额$4

想设定一个虚拟的样子吗

RollingStartDate ==01/05/2015 and RollingInterval==7 and  RollingEndDate ==08/05/2015
例如:

1st May 2015 Rolling 7 Days data set would be from 01/05/2015 to 25/04/2015
2nd May 2015 Rolling 7 Days data set would be from 02/05/2015 to 26/04/2015
....................................................................
7th May 2015 Rolling 7 Days data set would be from 07/05/2015 to 01/05/2015
8th May 2015 Rolling 7 Days data set would be from 08/05/2015 to 02/05/2015
Input.csv

Des,Date,Distributor,Amount,Loc
aaa,25/04/2015,abc123,25,bbb
aaa,25/04/2015,xyz456,75,bbb
aaa,26/04/2015,xyz456,50,bbb
aaa,27/04/2015,abc123,250,bbb
aaa,27/04/2015,abc123,100,bbb
aaa,29/04/2015,xyz456,50,bbb
aaa,30/04/2015,abc123,25,bbb
aaa,01/05/2015,xyz456,75,bbb
aaa,01/05/2015,abc123,50,bbb
aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb
示例:2015年5月8日滚动7天数据集为2015年5月8日至2015年5月2日

aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb
2015年5月8日滚动7天数据集的输出

RollingDate,Distributor,Amount
08/05/2015,abc123,145
08/05/2015,xyz456,350
我能够从该命令获得上述输出:

awk -F, '{key=$3;b[key]=b[key]+$4} END {for(i in a) print i","b[i]}'
请建议如何导出每周拆分数据集,然后求和

期望输出:

RollingDate,Distributor,Amount
01/05/2015,abc123,450
01/05/2015,xyz456,250
02/05/2015,abc123,450
02/05/2015,xyz456,250
03/05/2015,abc123,450
03/05/2015,xyz456,200
04/05/2015,abc123,130
04/05/2015,xyz456,235
05/05/2015,abc123,130
05/05/2015,xyz456,247
06/05/2015,abc123,162
06/05/2015,xyz456,240
07/05/2015,abc123,137
07/05/2015,xyz456,327
08/05/2015,abc123,145
08/05/2015,xyz456,350
编辑1

一,

逻辑是要找到一个金额总和分发给经销商7天的范围内,即,如果我需要计算5月1日的总和,那么我需要考虑从5月1日,4月30日,4月29日,4月28日,4月28日,γ,和,这相当于负6天回来的线项目…同样,5月2日滚动日期等于5月2日至5月2日26日减去6天

二,

日期格式为DD/MM/YYYY-2015年5月2日为5月2日

由于文件包含2到3个月的租期,不希望从文件中选择第一个日期25 / 04 / 2015,然后做减去6天的返回分析,因此RollingStartDate将帮助从哪个日期需要考虑数据,RollingInterval将有助于进行7天的回迁分析、14天的回迁分析或30天的月度回迁分析。 RollingedDate将有助于避免实际文件包含任何未来可用的日期数据,在这种情况下,如果需要排除5月9日或15日的日期行项目。。。
这里有一个解决方案,它只排除了没有7天前的日期,而不需要特定的开始/停止范围:

$ cat tst.awk        
BEGIN { FS=OFS=","; window=(window?window:7); secsPerDay=24*60*60 }
NR==1 { print "RollingDate", $3, $4; next }
{
    endSecs = mktime(gensub(/(..)\/(..)\/(....)/,"\\3 \\2 \\1 0 0 0","",$2))
    if (begSecs=="") {
        begSecs = endSecs + ((window-1) * secsPerDay)
    }
    amount[endSecs][$3] += $4
    dists[$3]
}
END {
    for (currSecs=begSecs; currSecs<=endSecs; currSecs+=secsPerDay) {
        for (dayNr=1; dayNr<=window; dayNr++) {
            rollSecs = currSecs - ((dayNr-1) * secsPerDay)
            for (dist in dists) {
                sum[dist] += (rollSecs in amount ? amount[rollSecs][dist] : 0)
            }
        }
        for (dist in dists) {
            print strftime("%d/%m/%Y",currSecs), dist, sum[dist]
            delete sum[dist]
        }
    }
}

要使用与7天不同的窗口大小,只需在命令行上设置:

$ awk -v window=5 -f tst.awk file
RollingDate,Distributor,Amount
29/04/2015,xyz456,175
29/04/2015,abc123,375
30/04/2015,xyz456,100
30/04/2015,abc123,375
01/05/2015,xyz456,125
01/05/2015,abc123,425
02/05/2015,xyz456,200
02/05/2015,abc123,100
03/05/2015,xyz456,200
03/05/2015,abc123,100
04/05/2015,xyz456,185
04/05/2015,abc123,130
05/05/2015,xyz456,197
05/05/2015,abc123,105
06/05/2015,xyz456,165
06/05/2015,abc123,87
07/05/2015,xyz456,177
07/05/2015,abc123,62
08/05/2015,xyz456,275
08/05/2015,abc123,120

上面使用GNU awk实现真正的2D数组和时间函数。希望很清楚,您可以进行任何需要的修改,以包括/排除特定的日期范围。

这里有一个解决方案,它只排除在日期之前没有7天的日期,而不需要特定的开始/停止范围:

$ cat tst.awk        
BEGIN { FS=OFS=","; window=(window?window:7); secsPerDay=24*60*60 }
NR==1 { print "RollingDate", $3, $4; next }
{
    endSecs = mktime(gensub(/(..)\/(..)\/(....)/,"\\3 \\2 \\1 0 0 0","",$2))
    if (begSecs=="") {
        begSecs = endSecs + ((window-1) * secsPerDay)
    }
    amount[endSecs][$3] += $4
    dists[$3]
}
END {
    for (currSecs=begSecs; currSecs<=endSecs; currSecs+=secsPerDay) {
        for (dayNr=1; dayNr<=window; dayNr++) {
            rollSecs = currSecs - ((dayNr-1) * secsPerDay)
            for (dist in dists) {
                sum[dist] += (rollSecs in amount ? amount[rollSecs][dist] : 0)
            }
        }
        for (dist in dists) {
            print strftime("%d/%m/%Y",currSecs), dist, sum[dist]
            delete sum[dist]
        }
    }
}

要使用与7天不同的窗口大小,只需在命令行上设置:

$ awk -v window=5 -f tst.awk file
RollingDate,Distributor,Amount
29/04/2015,xyz456,175
29/04/2015,abc123,375
30/04/2015,xyz456,100
30/04/2015,abc123,375
01/05/2015,xyz456,125
01/05/2015,abc123,425
02/05/2015,xyz456,200
02/05/2015,abc123,100
03/05/2015,xyz456,200
03/05/2015,abc123,100
04/05/2015,xyz456,185
04/05/2015,abc123,130
05/05/2015,xyz456,197
05/05/2015,abc123,105
06/05/2015,xyz456,165
06/05/2015,abc123,87
07/05/2015,xyz456,177
07/05/2015,abc123,62
08/05/2015,xyz456,275
08/05/2015,abc123,120

上面使用GNU awk实现真正的2D数组和时间函数。希望您可以进行任何必要的修改,以包括/排除特定的日期范围。

我不明白您是如何使用示例脚本在输出上获得日期的,它应该只是分发者和金额…@Tensibai,是的,您是对的,我只是把滚动日期逻辑放在正确理解需求上…所以不是你得到的实际输出,这使得你很难理解你实际上得到了什么以及你希望得到什么…顺便问一下,awk是强制性的吗?听起来这不是一个完美的工具,但是可以使用多维数组日期/分发服务器作为键,然后用一些pain@Tensibai,awk不是强制性的,但我没有Python的访问权限&PerlI不明白如何使用示例脚本在输出上获得日期,它应该只是分销商和数量…@Tensibai,是的,你是对的,我只是把滚动日期逻辑放在正确理解需求上…所以这不是你得到的实际输出,这使得你很难理解你实际上得到了什么和你希望得到什么…顺便问一下,awk是强制性的吗?听起来这不是一个完美的工具,但是可以使用多维数组日期/分发服务器作为键,然后用一些pain@Tensibai,awk不是强制性的,但是我没有Python和PerlEdMorton的访问权限,谢谢你,非常棒的脚本,它按照预期的方式工作,天才!!!接受并投票决定答案!!!埃德默顿,谢谢你的精彩剧本,它正按预期的方式运作,天才!!!接受并投票决定答案!!!