我有几个星期,几个月,几年的df。
week = ['01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022']
month = ["January", "January", "January", "January"]
year = [2022, 2022, 2022, 2022]
myDict = {}
myDict["Week"] = week
myDict["Month"] = month
myDict["Year"] = year
dates_df = pd.DataFrame(data=myDict)
dates_df Week Month Year
01/03/2022 - 01/09/2022 January 2022
01/10/2022 - 01/16/2022 January 2022
01/17/2022 - 01/23/2022 January 2022
01/24/2022 - 01/30/2022 January 2022数据看起来类似于(以前分组):
test_data = {'CLient Id': [1,1,1,1,2,2,2,3,3],
'Client Name': ['Tom Holland', 'Tom Holland', 'Tom Holland', 'Tom Holland', 'Brad Pitt', 'Brad Pitt', 'Brad Pitt', 'Anna Delvey', 'Anna Delvey'],
'City': ['New York', 'New York', 'New York', 'New York', 'Los Angeles', 'Los Angeles', 'Los Angeles', 'New York', 'New York'],
'Week': ['01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022',
'01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/24/2022 - 01/30/2022', '01/03/2022 - 01/09/2022',
'01/24/2022 - 01/30/2022'],
'Month': ['January', 'January', 'January', 'January', 'January', 'January', 'January', 'January', 'January'],
'Year': [2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022],
'Spent': [1000, 750, 1200, 850, 777, 1500, 1200, 1500, 1200]}
test_df = pd.DataFrame(data=test_data)
test_dfClient Id Client Name City Week Month Year Spent
1 Tom Holland New York 01/03/2022 - 01/09/2022 January 2022 1000
1 Tom Holland New York 01/10/2022 - 01/16/2022 January 2022 750
1 Tom Holland New York 01/17/2022 - 01/23/2022 January 2022 1200
1 Tom Holland New York 01/24/2022 - 01/30/2022 January 2022 850
2 Brad Pitt Los Angeles 01/03/2022 - 01/09/2022 January 2022 777
2 Brad Pitt Los Angeles 01/10/2022 - 01/16/2022 January 2022 1500
2 Brad Pitt Los Angeles 01/24/2022 - 01/30/2022 January 2022 1200
3 Anna Delvey New York 01/03/2022 - 01/09/2022 January 2022 1500
3 Anna Delvey New York 01/24/2022 - 01/30/2022 January 2022 1200我需要为每个缺少一周的客户端(Id)创建额外的客户端名称行,即城市、周、月和年,在本例中,df + be列应该是0。
我需要的产出:
Client Id Client Name City Week Month Year Spent
1 Tom Holland New York 01/03/2022 - 01/09/2022 January 2022 1000
1 Tom Holland New York 01/10/2022 - 01/16/2022 January 2022 750
1 Tom Holland New York 01/17/2022 - 01/23/2022 January 2022 1200
1 Tom Holland New York 01/24/2022 - 01/30/2022 January 2022 850
2 Brad Pitt Los Angeles 01/03/2022 - 01/09/2022 January 2022 777
2 Brad Pitt Los Angeles 01/10/2022 - 01/16/2022 January 2022 1500
2 Brad Pitt Los Angeles 01/17/2022 - 01/23/2022 January 2022 0
2 Brad Pitt Los Angeles 01/24/2022 - 01/30/2022 January 2022 1200
3 Anna Delvey New York 01/03/2022 - 01/09/2022 January 2022 1500
3 Anna Delvey New York 01/10/2022 - 01/16/2022 January 2022 0
3 Anna Delvey New York 01/17/2022 - 01/23/2022 January 2022 0
3 Anna Delvey New York 01/24/2022 - 01/30/2022 January 2022 1200我试着对它进行编码,但没有运气。
发布于 2022-03-31 20:19:51
您可以使用pivot + reindex + fillna (获取丢失的数据)+ stack (返回到以前的形状):
columns = ['Week','Month','Year']
out = (test_df.pivot(['CLient Id', 'Client Name', 'City'], columns, ['Spent'])
.reindex(pd.MultiIndex.from_arrays(dates_df.assign(Spent='Spent').to_numpy()[:, [-1,0,1,2]].T,
names=[None]+columns), axis=1)
.fillna(0).stack(level=columns).reset_index())输出:
CLient Id Client Name City Week Month Year Spent
0 1 Tom Holland New York 01/03/2022 - 01/09/2022 January 2022 1000.0
1 1 Tom Holland New York 01/10/2022 - 01/16/2022 January 2022 750.0
2 1 Tom Holland New York 01/17/2022 - 01/23/2022 January 2022 1200.0
3 1 Tom Holland New York 01/24/2022 - 01/30/2022 January 2022 850.0
4 2 Brad Pitt Los Angeles 01/03/2022 - 01/09/2022 January 2022 777.0
5 2 Brad Pitt Los Angeles 01/10/2022 - 01/16/2022 January 2022 1500.0
6 2 Brad Pitt Los Angeles 01/17/2022 - 01/23/2022 January 2022 0.0
7 2 Brad Pitt Los Angeles 01/24/2022 - 01/30/2022 January 2022 1200.0
8 3 Anna Delvey New York 01/03/2022 - 01/09/2022 January 2022 1500.0
9 3 Anna Delvey New York 01/10/2022 - 01/16/2022 January 2022 0.0
10 3 Anna Delvey New York 01/17/2022 - 01/23/2022 January 2022 0.0
11 3 Anna Delvey New York 01/24/2022 - 01/30/2022 January 2022 1200.0 如果有多个列要填充0,则可以使用:
columns = ['Week','Month','Year']
value_columns = ['Spent', ...]
new_df = (pd.DataFrame({'key':[1]*len(value_columns),'New':value_columns})
.merge(dates_df.assign(key=1)).drop(columns='key'))
out = (test_df.pivot(['CLient Id', 'Client Name', 'City'], columns, value_columns)
.reindex(pd.MultiIndex.from_arrays(zip(*new_df.to_numpy()),
names=[None] + columns), axis=1)
.fillna(0).stack(level=columns).reset_index())https://stackoverflow.com/questions/71698633
复制相似问题