请帮助计算基于Moving/Rolling back Weekly Sum of Amount($4)的Distributor wise ($2) and Rolling Date wise。
想要设置像
RollingStartDate ==01/05/2015 and RollingInterval==7 and RollingEndDate ==08/05/2015例如:
1st May 2015 Rolling 7 Days data set would be from 01/05/2015 to 25/04/2015
2nd May 2015 Rolling 7 Days data set would be from 02/05/2015 to 26/04/2015
....................................................................
7th May 2015 Rolling 7 Days data set would be from 07/05/2015 to 01/05/2015
8th May 2015 Rolling 7 Days data set would be from 08/05/2015 to 02/05/2015Input.csv
Des,Date,Distributor,Amount,Loc
aaa,25/04/2015,abc123,25,bbb
aaa,25/04/2015,xyz456,75,bbb
aaa,26/04/2015,xyz456,50,bbb
aaa,27/04/2015,abc123,250,bbb
aaa,27/04/2015,abc123,100,bbb
aaa,29/04/2015,xyz456,50,bbb
aaa,30/04/2015,abc123,25,bbb
aaa,01/05/2015,xyz456,75,bbb
aaa,01/05/2015,abc123,50,bbb
aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb示例:2015年5月8日滚动7天数据集将从2015年05月8日至2015年02月5日
aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb2015年5月8日滚动7天数据集的产出
RollingDate,Distributor,Amount
08/05/2015,abc123,145
08/05/2015,xyz456,350我能够从这个命令获得上面的输出:
awk -F, '{key=$3;b[key]=b[key]+$4} END {for(i in a) print i","b[i]}'请建议如何导出每周分拆数据集,然后求和。
期望产出:
RollingDate,Distributor,Amount
01/05/2015,abc123,450
01/05/2015,xyz456,250
02/05/2015,abc123,450
02/05/2015,xyz456,250
03/05/2015,abc123,450
03/05/2015,xyz456,200
04/05/2015,abc123,130
04/05/2015,xyz456,235
05/05/2015,abc123,130
05/05/2015,xyz456,247
06/05/2015,abc123,162
06/05/2015,xyz456,240
07/05/2015,abc123,137
07/05/2015,xyz456,327
08/05/2015,abc123,145
08/05/2015,xyz456,350Edit#1
1.
其逻辑是在7天内向分销商开出金额总和,即如果我需要计算5月1日的金额,则需要考虑从5月1日、4月30日、4月29日、4月28日、4月27日、4月26日和4月25日开始的线项目,这相当于1st May (-) minus 6 days back .同样,5月2日的滚动日期等于5月2日至5月26日( 2nd May minus 6 days back .)
2.
日期格式为DD/MM/YYYY - 02/05/2015为5月2日
"RollingStartDate"将帮助从哪个日期考虑数据,"RollingInterval"将帮助进行"7天“后移或"14天”移动回移分析或“每月30天”移回分析。"RollingEndDate"将有助于避免实际文件包含任何未来的日期数据可用,在这种情况下,如果09或15可能日期行项目需要排除.发布于 2015-05-28 18:33:57
这里有一个解决方案,它只排除没有提前7天的日期,而不需要一个特定的开始/停止范围:
$ cat tst.awk
BEGIN { FS=OFS=","; window=(window?window:7); secsPerDay=24*60*60 }
NR==1 { print "RollingDate", $3, $4; next }
{
endSecs = mktime(gensub(/(..)\/(..)\/(....)/,"\\3 \\2 \\1 0 0 0","",$2))
if (begSecs=="") {
begSecs = endSecs + ((window-1) * secsPerDay)
}
amount[endSecs][$3] += $4
dists[$3]
}
END {
for (currSecs=begSecs; currSecs<=endSecs; currSecs+=secsPerDay) {
for (dayNr=1; dayNr<=window; dayNr++) {
rollSecs = currSecs - ((dayNr-1) * secsPerDay)
for (dist in dists) {
sum[dist] += (rollSecs in amount ? amount[rollSecs][dist] : 0)
}
}
for (dist in dists) {
print strftime("%d/%m/%Y",currSecs), dist, sum[dist]
delete sum[dist]
}
}
}。
$ awk -f tst.awk file
RollingDate,Distributor,Amount
01/05/2015,xyz456,250
01/05/2015,abc123,450
02/05/2015,xyz456,250
02/05/2015,abc123,450
03/05/2015,xyz456,200
03/05/2015,abc123,450
04/05/2015,xyz456,235
04/05/2015,abc123,130
05/05/2015,xyz456,247
05/05/2015,abc123,130
06/05/2015,xyz456,240
06/05/2015,abc123,162
07/05/2015,xyz456,327
07/05/2015,abc123,137
08/05/2015,xyz456,350
08/05/2015,abc123,145。
要使用与7天不同的窗口大小,只需在命令行上设置它:
$ awk -v window=5 -f tst.awk file
RollingDate,Distributor,Amount
29/04/2015,xyz456,175
29/04/2015,abc123,375
30/04/2015,xyz456,100
30/04/2015,abc123,375
01/05/2015,xyz456,125
01/05/2015,abc123,425
02/05/2015,xyz456,200
02/05/2015,abc123,100
03/05/2015,xyz456,200
03/05/2015,abc123,100
04/05/2015,xyz456,185
04/05/2015,abc123,130
05/05/2015,xyz456,197
05/05/2015,abc123,105
06/05/2015,xyz456,165
06/05/2015,abc123,87
07/05/2015,xyz456,177
07/05/2015,abc123,62
08/05/2015,xyz456,275
08/05/2015,abc123,120上面使用GNU awk作为真正的2D数组和时间函数。希望这是足够清楚的,您可以做任何修改,您需要包括/排除特定的日期范围。
https://stackoverflow.com/questions/30505356
复制相似问题