我的数据集如下所示。它包含某个数据的邮政编码内的购买记录。我想创建一个新的数据帧,并将每个邮政编码中每个月的购买数量相加。问题是,有时有几个月没有任何活动。
ID Zipcode Date Purchase
1 9999 2018-12-24 1
2 9999 2018-12-26 1
3 9999 2019-3-14 1
4 9999 2019-4-8 1
5 2400 2018-12-12 1
6 2400 2018-12-14 1
7 2400 2019-1-15 1
8 2400 2019-2-30 1理想情况下,数据帧应如下所示:
Zipcode Period Cumulative purchases
9999 December 2018 2
9999 January 2019 2
9999 February 2019 2
9999 March 2019 2
9999 April 2019 3
9999 May 2019 3
2400 December 2018 2
2400 January 2019 3
2400 February 2019 4
2400 March 2019 4
2400 April 2019 4
etc发布于 2021-09-09 09:16:33
您可以使用resample按月按sum聚合每个ZipCode,然后按一级聚合,此处为Zipcode按累计和聚合:
df['Date'] = pd.to_datetime(df['Date'])
df = (df.set_index('Date')
.groupby('Zipcode', sort=False)
.resample('MS')['Purchase'].sum()
.groupby(level=0)
.cumsum()
.reset_index(name='Cumulative purchases'))
df['Date'] = df['Date'].dt.strftime('%B %Y')
print (df)
Zipcode Date Cumulative purchases
0 9999 December 2018 2
1 9999 January 2019 2
2 9999 February 2019 2
3 9999 March 2019 3
4 9999 April 2019 4
5 2400 December 2018 2
6 2400 January 2019 3
7 2400 February 2019 4https://stackoverflow.com/questions/69115319
复制相似问题