我的头脑被炸了,所以希望你能在这里提供帮助:)我有一些数据,我需要按月和年来“累计”,并且无法想出如何去做。我的数据看起来是这样的- testdf:
month year power_kwh
0 1 1990 416491241.4
1 2 1990 343033068.2
2 3 1990 301451921.2
3 4 1990 275719811.1
4 5 1990 292287261.0
5 6 1990 227289410.6
6 7 1990 189335818.5
7 8 1990 115649657.7
8 9 1990 191111685.6
9 10 1990 319101975.4
10 11 1990 366723962.5
11 12 1990 392145037.3
12 1 1991 333036371.8
13 2 1991 325317995.1
14 3 1991 370866553.8
15 4 1991 300957433.0
16 5 1991 223408949.5
17 6 1991 165998110.1
18 7 1991 147475868.2
19 8 1991 158274746.0
20 9 1991 181327778.7
21 10 1991 284857327.5
22 11 1991 298116425.1
23 12 1991 363653602.6我需要我的数据如下所示,所以该列(power_kwh)按月份和每个唯一的年份进行求和。
month year power_kwh cumsum_power_kwh
0 1 1990 416491241.4 416491241
1 2 1990 343033068.2 759524310
2 3 1990 301451921.2 1060976231
3 4 1990 275719811.1 1336696042
4 5 1990 292287261 1628983303
5 6 1990 227289410.6 1856272714
6 7 1990 189335818.5 2045608532
7 8 1990 115649657.7 2161258190
8 9 1990 191111685.6 2352369875
9 10 1990 319101975.4 2671471851
10 11 1990 366723962.5 3038195813
11 12 1990 392145037.3 3430340851
12 1 1991 333036371.8 333036372
13 2 1991 325317995.1 4088695217
14 3 1991 370866553.8 4459561771
15 4 1991 300957433 4760519204
16 5 1991 223408949.5 4983928154
17 6 1991 165998110.1 5149926264
18 7 1991 147475868.2 5297402132
19 8 1991 158274746 5455676878
20 9 1991 181327778.7 5637004657
21 10 1991 284857327.5 5921861984
22 11 1991 298116425.1 6219978409
23 12 1991 363653602.6 6583632012我已经尝试了下面的代码,也使用了apply和lambda函数,我一直得到一个答案,但是累积和没有像您在这里看到的那样被计算出来。谢谢你的建议!
testdf['results'] = testdf.groupby(['year','month'])['power_kwh'].cumsum()这就是我一直想要的-
month year power_kwh results
0 1 1990 416491241.4 416491241.4
1 2 1990 343033068.2 343033068.2
2 3 1990 301451921.2 301451921.2
3 4 1990 275719811.1 275719811.1
4 5 1990 292287261.0 292287261.0
5 6 1990 227289410.6 227289410.6
6 7 1990 189335818.5 189335818.5
7 8 1990 115649657.7 115649657.7
8 9 1990 191111685.6 191111685.6
9 10 1990 319101975.4 319101975.4
10 11 1990 366723962.5 366723962.5
11 12 1990 392145037.3 392145037.3
12 1 1991 333036371.8 333036371.8
13 2 1991 325317995.1 325317995.1
14 3 1991 370866553.8 370866553.8
15 4 1991 300957433.0 300957433.0
16 5 1991 223408949.5 223408949.5
17 6 1991 165998110.1 165998110.1
18 7 1991 147475868.2 147475868.2
19 8 1991 158274746.0 158274746.0
20 9 1991 181327778.7 181327778.7
21 10 1991 284857327.5 284857327.5
22 11 1991 298116425.1 298116425.1
23 12 1991 363653602.6 363653602.6发布于 2022-09-16 17:45:20
您的预期输出只能按年分组计算:
df['cumsum_power_kwh'] = df.groupby('year')['power_kwh'].cumsum()如果您的数据不能保证有序,您可以先对其进行排序:
df = df.sort_values(["year", "month"])您遇到的问题是因为cumsum分别应用于每个组,因此对于每个唯一的(year、month)对,cumsum被重置为0。
https://stackoverflow.com/questions/73748541
复制相似问题