我有这样的df:
lst_1 = ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B']
lst_2 = [500, 600, 800, 900,700, 800,1000, 1200]
lst_3 = ['10/31/2022', '11/02/2022','11/07/2022', '11/14/2022', '10/31/2022', '11/02/2022','11/07/2022', '11/14/2022']
df1 = pd.DataFrame(list(zip(lst_1 , lst_2, lst_3)),
columns =['SKU', 'Sum_Qty_Sold', 'Date_Updated'])我期望的df:
lst_1 = ['A', 'A', 'B', 'B']
lst_2 = [300, 100, 300, 200]
lst_3 = ['10/31/2022-11/07/2022', '11/07/2022-11/14/2022', '10/31/2022-11/07/2022', '11/07/2022-11/14/2022']
result = pd.DataFrame(list(zip(lst_1 , lst_2, lst_3)),
columns =['SKU', 'Qty_Sold_By_Week', 'Time_Series'])如何计算7天内的“Qty_Sold_By_Week”?'Qty_Sold_By_Week‘=-( df中Date_Updated的'Sum_Qty_Sold’-‘Sum_Qty_Sold’在7天后(如果有))
发布于 2022-11-15 05:17:49
grouper = pd.PeriodIndex(df1['Date_Updated'], freq='w').to_timestamp().strftime('%m/%d/%Y')
df = (df1.groupby(['SKU', grouper])['Sum_Qty_Sold']
.first().reset_index().sort_values('SKU').iloc[:, [0, -1, 1]])
df['Sum_Qty_Sold'] = df.groupby('SKU')['Sum_Qty_Sold'].shift(-1) - df['Sum_Qty_Sold']
df['Date_Updated'] = df['Date_Updated'] + '-' + df.groupby('SKU')['Date_Updated'].shift(-1)
result = df.dropna().rename(columns={'Sum_Qty_Sold':'Qty_Sold_By_Week', 'Date_Updated':'Time_Series'})输出(result):
SKU Qty_Sold_By_Week Time_Series
0 A 300.0 10/31/2022-11/07/2022
1 A 100.0 11/07/2022-11/14/2022
3 B 300.0 10/31/2022-11/07/2022
4 B 200.0 11/07/2022-11/14/2022下一次,不要让示例的列名像这样。太长的名字不需要解决问题。
https://stackoverflow.com/questions/74440624
复制相似问题