给出如下样本数据,我希望每周提取一个数据输入,如果每周有多个条目,那么我将使用该周最大的工作日数据:
date variable value
0 2020-11-4 quantity 564.0000
1 2020-11-11 quantity 565.0000
2 2020-11-18 quantity 566.0000
3 2020-11-25 quantity 566.0000
4 2020-11-2 price 1829.1039
5 2020-11-3 price 1789.5883
6 2020-11-4 price 1755.4307
7 2020-11-5 price 1750.0727
8 2020-11-6 price 1746.7239
9 2020-11-9 price 1756.1005
10 2020-11-10 price 1752.0820
11 2020-11-11 price 1814.3693
12 2020-11-12 price 1833.7922
13 2020-11-13 price 1833.7922
14 2020-11-16 price 1784.2302
15 2020-11-17 price 1764.1376
16 2020-11-18 price 1770.1654
17 2020-11-19 price 1757.4400
18 2020-11-20 price 1770.1654要获取每个约会的周数,我使用df['week_number'] = pd.to_datetime(df['date']).dt.week。
date variable value week_number
0 2020-11-4 quantity 564.0000 45 --> to keep
1 2020-11-11 quantity 565.0000 46 --> to keep
2 2020-11-18 quantity 566.0000 47 --> to keep
3 2020-11-25 quantity 566.0000 48 --> to keep
4 2020-11-2 price 1829.1039 45
5 2020-11-3 price 1789.5883 45
6 2020-11-4 price 1755.4307 45
7 2020-11-5 price 1750.0727 45
8 2020-11-6 price 1746.7239 45 --> to keep, since it's the largest weekday for this week
9 2020-11-9 price 1756.1005 46
10 2020-11-10 price 1752.0820 46
11 2020-11-11 price 1814.3693 46
12 2020-11-12 price 1833.7922 46
13 2020-11-13 price 1833.7922 46 --> to keep, since it's the largest weekday for this week
14 2020-11-16 price 1784.2302 47
15 2020-11-17 price 1764.1376 47
16 2020-11-18 price 1770.1654 47
17 2020-11-19 price 1757.4400 47
18 2020-11-20 price 1770.1654 47 --> to keep, since it's the largest weekday for this week最后,我将将表示to_keep的行重组为预期结果,如下所示:
variable the_45th_week the_46th_week the_47th_week the_48th_week
0 quantity 564.0000 565.0000 566.0000 566.0
1 price 1756.1005 1833.7922 1770.1654 NaN如何操作数据以获得预期的结果?真诚的感谢。
编辑:
df = df.sort_values(by=['variable','date'], ascending=False)
df.drop_duplicates(['variable', 'week_number'], keep='last')退出:
date variable value week_number
0 2020-11-4 quantity 564.0000 45
3 2020-11-25 quantity 566.0000 48
2 2020-11-18 quantity 566.0000 47
1 2020-11-11 quantity 565.0000 46
4 2020-11-2 price 1829.1039 45
14 2020-11-16 price 1784.2302 47
10 2020-11-10 price 1752.0820 46发布于 2021-11-29 11:49:58
在您的解决方案中,可以使用pivot添加rename
df['week_number'] = pd.to_datetime(df['date']).dt.week
df = df.sort_values(by=['variable','date'], ascending=False)
df = df.drop_duplicates(['variable', 'week_number'], keep='last')
f = lambda x: f'the_{x}th_week'
out = df.pivot('variable','week_number','value').rename(columns=f)
print(out)
week_number the_45th_week the_46th_week the_47th_week the_48th_week
variable
price 1829.1039 1752.082 1784.2302 NaN
quantity 564.0000 565.000 566.0000 566.0或者删除DataFrame.drop_duplicates,因此可以将DataFrame.pivot_table与聚合函数last一起使用
df['week_number'] = pd.to_datetime(df['date']).dt.week
df = df.sort_values(by=['variable','date'], ascending=False)
f = lambda x: f'the_{x}th_week'
out = df.pivot_table(index='variable',columns='week_number',values='value', aggfunc='last').rename(columns=f)编辑:以获得与预期结果完全相同的结果:
out.reset_index().rename_axis(None, axis=1)退出:
variable the_45th_week the_46th_week the_47th_week the_48th_week
0 price 1829.1039 1752.082 1784.2302 NaN
1 quantity 564.0000 565.000 566.0000 566.0https://stackoverflow.com/questions/70153963
复制相似问题