我有一个数据帧,其中包含:
State Country Date Cases
0 NaN Afghanistan 2020-01-22 0
271 NaN Afghanistan 2020-01-23 0
... ... ... ... ...
85093 NaN Zimbabwe 2020-11-30 9950
85364 NaN Zimbabwe 2020-12-01 10129我正在尝试创建一个新的累积案例列,但按国家和州分组。
State Country Date Cases Total Cases
231 California USA 2020-01-22 5 5
342 California USA 2020-01-23 10 15
233 Texas USA 2020-01-22 4 4
322 Texas USA 2020-01-23 12 16我一直在尝试关注Pandas groupby cumulative sum,并尝试了以下内容:
df['Total'] = df.groupby(['State','Country'])['Cases'].cumsum()返回一系列-1
df['Total'] = df.groupby(['State', 'Country']).sum() \
.groupby(level=0).cumsum().reset_index()返回总和。
df['Total'] = df.groupby(['Country'])['Cases'].apply(lambda x: x.cumsum())不会按州分开求和。
df_f['Total'] = df_f.groupby(['Region','State'])['Cases'].apply(lambda x: x.cumsum())除了“状态”为“NaN”时,“Total”也为NaN时,此方法才起作用。
发布于 2020-12-27 11:45:59
arrays = [['California', 'California', 'Texas', 'Texas'],
['USA', 'USA', 'USA', 'USA'],
['2020-01-22','2020-01-23','2020-01-22','2020-01-23'], [5,10,4,12]]
df = pd.DataFrame(list(zip(*arrays)), columns = ['State', 'Country', 'Date', 'Cases'])
df
State Country Date Cases
0 California USA 2020-01-22 5
1 California USA 2020-01-23 10
2 Texas USA 2020-01-22 4
3 Texas USA 2020-01-23 12
temp = df.set_index(['State', 'Country','Date'], drop=True).sort_index( )
df['Total Cases'] = temp.groupby(['State', 'Country']).cumsum().reset_index()['Cases']
df
State Country Date Cases Total Cases
0 California USA 2020-01-22 5 5
1 California USA 2020-01-23 10 15
2 Texas USA 2020-01-22 4 4
3 Texas USA 2020-01-23 12 16https://stackoverflow.com/questions/65460252
复制相似问题