首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >从每日中提取每周数据,并使用Pandas将其从长到宽的格式进行整形。

从每日中提取每周数据,并使用Pandas将其从长到宽的格式进行整形。
EN

Stack Overflow用户
提问于 2021-11-29 11:36:20
回答 1查看 109关注 0票数 1

给出如下样本数据,我希望每周提取一个数据输入,如果每周有多个条目,那么我将使用该周最大的工作日数据:

代码语言:javascript
复制
          date  variable      value
0    2020-11-4  quantity   564.0000
1   2020-11-11  quantity   565.0000
2   2020-11-18  quantity   566.0000
3   2020-11-25  quantity   566.0000
4    2020-11-2     price  1829.1039
5    2020-11-3     price  1789.5883
6    2020-11-4     price  1755.4307
7    2020-11-5     price  1750.0727
8    2020-11-6     price  1746.7239
9    2020-11-9     price  1756.1005
10  2020-11-10     price  1752.0820
11  2020-11-11     price  1814.3693
12  2020-11-12     price  1833.7922
13  2020-11-13     price  1833.7922
14  2020-11-16     price  1784.2302
15  2020-11-17     price  1764.1376
16  2020-11-18     price  1770.1654
17  2020-11-19     price  1757.4400
18  2020-11-20     price  1770.1654

要获取每个约会的周数,我使用df['week_number'] = pd.to_datetime(df['date']).dt.week

代码语言:javascript
复制
          date  variable      value  week_number
0    2020-11-4  quantity   564.0000           45  --> to keep
1   2020-11-11  quantity   565.0000           46  --> to keep
2   2020-11-18  quantity   566.0000           47  --> to keep
3   2020-11-25  quantity   566.0000           48  --> to keep
4    2020-11-2     price  1829.1039           45
5    2020-11-3     price  1789.5883           45
6    2020-11-4     price  1755.4307           45
7    2020-11-5     price  1750.0727           45
8    2020-11-6     price  1746.7239           45  --> to keep, since it's the largest weekday for this week
9    2020-11-9     price  1756.1005           46
10  2020-11-10     price  1752.0820           46
11  2020-11-11     price  1814.3693           46
12  2020-11-12     price  1833.7922           46
13  2020-11-13     price  1833.7922           46  --> to keep, since it's the largest weekday for this week
14  2020-11-16     price  1784.2302           47
15  2020-11-17     price  1764.1376           47
16  2020-11-18     price  1770.1654           47
17  2020-11-19     price  1757.4400           47
18  2020-11-20     price  1770.1654           47  --> to keep, since it's the largest weekday for this week

最后,我将将表示to_keep的行重组为预期结果,如下所示:

代码语言:javascript
复制
   variable  the_45th_week  the_46th_week  the_47th_week  the_48th_week
0  quantity       564.0000       565.0000       566.0000          566.0
1     price      1756.1005      1833.7922      1770.1654            NaN

如何操作数据以获得预期的结果?真诚的感谢。

编辑:

代码语言:javascript
复制
df = df.sort_values(by=['variable','date'], ascending=False)
df.drop_duplicates(['variable', 'week_number'], keep='last')

退出:

代码语言:javascript
复制
          date  variable      value  week_number
0    2020-11-4  quantity   564.0000           45
3   2020-11-25  quantity   566.0000           48
2   2020-11-18  quantity   566.0000           47
1   2020-11-11  quantity   565.0000           46
4    2020-11-2     price  1829.1039           45
14  2020-11-16     price  1784.2302           47
10  2020-11-10     price  1752.0820           46
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-11-29 11:49:58

在您的解决方案中,可以使用pivot添加rename

代码语言:javascript
复制
df['week_number'] = pd.to_datetime(df['date']).dt.week
df = df.sort_values(by=['variable','date'], ascending=False)
df = df.drop_duplicates(['variable', 'week_number'], keep='last')

f = lambda x: f'the_{x}th_week'
out = df.pivot('variable','week_number','value').rename(columns=f)
print(out)
week_number  the_45th_week  the_46th_week  the_47th_week  the_48th_week
variable                                                               
price            1829.1039       1752.082      1784.2302            NaN
quantity          564.0000        565.000       566.0000          566.0

或者删除DataFrame.drop_duplicates,因此可以将DataFrame.pivot_table与聚合函数last一起使用

代码语言:javascript
复制
df['week_number'] = pd.to_datetime(df['date']).dt.week
df = df.sort_values(by=['variable','date'], ascending=False)

f = lambda x: f'the_{x}th_week'
out = df.pivot_table(index='variable',columns='week_number',values='value', aggfunc='last').rename(columns=f)

编辑:以获得与预期结果完全相同的结果:

代码语言:javascript
复制
out.reset_index().rename_axis(None, axis=1)

退出:

代码语言:javascript
复制
   variable  the_45th_week  the_46th_week  the_47th_week  the_48th_week
0     price      1829.1039       1752.082      1784.2302            NaN
1  quantity       564.0000        565.000       566.0000          566.0
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/70153963

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档