我有一只熊猫df如下:
YEAR MONTH USERID TRX_COUNT
2020 1 1 1
2020 2 1 2
2020 3 1 1
2020 12 1 1
2021 1 1 3
2021 2 1 3
2021 3 1 4我想要sum TRX_COUNT这样,每个TRX_COUNT都是未来12个月的sum of TRX_COUNTS。所以我的最终结果应该是
YEAR MONTH USERID TRX_COUNT TRX_COUNT_SUM
2020 1 1 1 5
2020 2 1 2 7
2020 3 1 1 8
2020 12 1 1 11
2021 1 1 3 10
2021 2 1 3 7
2021 3 1 4 4例如,TRX_COUNT_SUM For 2020/1是1+2+1+1=5前12个月的计数。
我感到困惑的有两个方面:
cumsum的各种变体和按USERID, YR, MONTH分组,但在处理时间窗口时遇到了错误,因为可能存在用户没有事务的MONTHS,这些事务必须考虑。例如,在2020/1中,用户在4-11个月内没有事务,因此全年的事务计数为5.。
最后,
2021/3 (左为4)一样)。对如何处理这件事有什么想法吗?谢谢!
发布于 2022-05-13 18:06:22
我能够使用numpy数组、熊猫和索引的组合来完成这一任务。
import pandas as pd
import numpy as np
#df = your dataframe
df_dates = pd.DataFrame(np.arange(np.datetime64('2020-01-01'), np.datetime64('2021-04-01'), np.timedelta64(1, 'M'), dtype='datetime64[M]').astype('datetime64[D]'), columns = ['DATE'])
df_dates['YEAR'] = df_dates['DATE'].apply(lambda x : str(x).split('-')[0]).apply(lambda x : int(x))
df_dates['MONTH'] = df_dates['DATE'].apply(lambda x : str(x).split('-')[1]).apply(lambda x : int(x))
df_merge = df_dates.merge(df, how = 'left')
df_merge.replace(np.nan, 0, inplace=True)
df_merge.reset_index(inplace = True)
for i in range(0, len(df_merge)):
max_index = df_merge['index'].max()
if(i + 11 < max_index):
df_merge.at[i, 'TRX_COUNT_SUM'] = df_merge.iloc[i:i + 12]['TRX_COUNT'].sum()
elif(i != max_index):
df_merge.at[i, 'TRX_COUNT_SUM'] = df_merge.iloc[i:max_index + 1]['TRX_COUNT'].sum()
else:
df_merge.at[i, 'TRX_COUNT_SUM'] = df_merge.iloc[i]['TRX_COUNT']
final_df = pd.merge(df_merge, df)发布于 2022-05-13 18:25:33
试试这个:
# Set the Dataframe index to a time series constructed from YEAR and MONTH
ts = pd.to_datetime(df.assign(DAY=1)[["YEAR", "MONTH", "DAY"]])
df.set_index(ts, inplace=True)
df["TRX_COUNT_SUM"] = (
# Reindex the dataframe with every missing month in-between
# Also reverse the index so that rolling(12) means 12 months
# forward instead of backward
df.reindex(pd.date_range(ts.min(), ts.max(), freq="MS")[::-1])
# Roll and sum
.rolling(12, min_periods=1)
["TRX_COUNT"].sum()
)https://stackoverflow.com/questions/72231800
复制相似问题