我有两个数据文件,如下所示:
df_A = pd.DataFrame({'Date': ['1/1/2016', '1/2/2016', '1/3/2016', '1/4/2016', '1/5/2016', '1/6/2016', '1/7/2016', '1/8/2016', '1/9/2016', '1/10/2016', '1/11/2016', '1/12/2016', '1/13/2016', '1/14/2016', '1/15/2016', '1/16/2016', '1/17/2016', '1/18/2016', '1/19/2016', '1/20/2016', '1/21/2016', '1/22/2016', '1/23/2016', '1/24/2016', '1/25/2016', '1/26/2016', '1/27/2016', '1/28/2016', '1/29/2016', '1/30/2016', '1/31/2016', '2/1/2016', '2/2/2016', '2/3/2016', '2/4/2016', '2/5/2016', '2/6/2016', '2/7/2016'],
'445_Week': [20160101, 20160101, 20160101, 20160101, 20160101, 20160101, 20160101, 20160101, 20160102, 20160102, 20160102, 20160102, 20160102, 20160102, 20160102, 20160103, 20160103, 20160103, 20160103, 20160103, 20160103, 20160103, 20160103, 20160104, 20160104, 20160104, 20160104, 20160104, 20160104, 20160104, 20160104, 20160201, 20160201, 20160201, 20160201, 20160201, 20160201, 20160201],
'Week': ['1','1','1','1','1','1','1','2','2','2','2','2','2','2','3','3','3','3','3','3','3','3','4','4','4','4','4','4','4','4','1','1','1','1','1','1','1','1',],
'Sales': ['10', '15', '20', '15','10','20', '10','15', '10', '15','20', '15','10','20', '10','15','10', '15', '20', '15','10','20', '10','15', '10', '15','20', '15','10','20', '10','15', '10','15', '20', '15','10','20']})
df_B = pd.DataFrame({'Date': ['1/1/2016','1/2/2016', '1/3/2016', '1/4/2016','2/1/2016'],
'445_Week': [20160101, 20160102, 20160103, 20160104, 20160201],
'Week': ['1', '2', '3', '4', '5'],
'Sales': ['10','15', '20', '15', '10']})我正在使用4-4-5日历,如上面的“445_Week”栏所示。我的目标是将df_B中的' date‘列替换为'm/w/yyyy’格式,如df_A中所示的正确的'Date‘。我想通过从df_A获得“445_Week”中每个组的最小日期来实现这一点。这是理想的最终结果:
df_C = pd.DataFrame({'Date': ['1/1/2016','1/9/2016', '1/16/2016', '1/24/2016','2/1/2016'],
'445_Week': [20160101, 20160102, 20160103, 20160104, 20160201],
'Week': ['1', '2', '3', '4', '5'],
'Sales': ['114.375','14.285', '14.375', '14.375', '15']})请注意,最后数据的销售部分只是每个组的值的平均值。
以下是我迄今所做的尝试:
dfc = df_A.groupby('445_Week')['Date']
new_df = df_A.assign(Date = dfc.transform(min))这只是从df_A中创建了一个新的数据格式,在该数据中,我们获得了每个445_Week的最小日期。我相信下一步将是合并这两个数据格式,但我不确定这是否正确。
发布于 2022-08-03 16:50:12
您可以尝试简单地替换所需列中的数据,并通过计算添加另一个数据。我将将列new_df‘’Sales‘类型转换为float,将df_A' date’列转换为日期格式。如果不转换df_A‘’Date‘列,则索引1的行将无法正确计算(这将是1/10/2016)。你的第一个平均数是114.375,应该是14.375000。
df_A['Date'] = pd.to_datetime(df_A['Date'])
dfc = df_A.groupby('445_Week')['Date']
new_df = df_A.assign(Date=dfc.transform(min))
new_df['Sales'] = new_df['Sales'].astype(float)
aaa = new_df.groupby('Date')['Sales'].mean()
df_B['Date'] = aaa.index
df_B['Sales'] = aaa.values
print(df_B)输出
Date 445_Week Week Sales
0 2016-01-01 20160101 1 14.375000
1 2016-01-09 20160102 2 14.285714
2 2016-01-16 20160103 3 14.375000
3 2016-01-24 20160104 4 14.375000
4 2016-02-01 20160201 5 15.000000如果df_B‘’Date‘列需要原始格式的数据,则可以将其转换回string:
df_B['Date'] = df_B['Date'].dt.strftime("%-m/%-d/%Y")输出
Date 445_Week Week Sales
0 1/1/2016 20160101 1 14.375000
1 1/9/2016 20160102 2 14.285714
2 1/16/2016 20160103 3 14.375000
3 1/24/2016 20160104 4 14.375000
4 2/1/2016 20160201 5 15.000000https://stackoverflow.com/questions/73210205
复制相似问题