首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >两个最近日期之间的熊猫分组差异

两个最近日期之间的熊猫分组差异
EN

Stack Overflow用户
提问于 2020-11-16 19:38:15
回答 1查看 30关注 0票数 0

Pandas版本: 1.1.0

你好,我正在尝试找出data_collected的最后两个日期之间的值的差异。虽然代码在处理连续日期时运行良好,但我被困在如何处理周末,因为没有收集到数据,留下了2天的空档。这会导致groupby.diff()忽略它们。

使用下面的代码,我能够分割出两个特定的日期,并使用.reindex_like()获得结果。

代码语言:javascript
复制
# (working example when specifically slicing on 2 dates)
prior_date = df.loc[df['date_collected'] == '2020-11-06']
current = df.loc[df['date_collected'] == '2020-11-09']

prior_date = prior_date.set_index('date')['value']
current = current.set_index('date')['value']
prior_date = prior_date.reindex_like(current).fillna(0)

df = (current - prior_date).reset_index()
change = df[df['value'] != 0].dropna(axis=0)

但是,当我试图为整个数据帧模拟这些结果时,我找不到一种在多索引上使用reindex_like的方法。尝试使用.last(),但后来意识到错过周末成了一个问题。

代码语言:javascript
复制
# (current result down below)
chng = df.set_index(['date_collected', 'date'])['value']
chng = chng.groupby(level=1).diff().reset_index()

last = df.groupby('date_collected')[['date', 'value']].last().reset_index()
chng = chng.set_index(['date_collected', 'value'])
last = last.set_index(['date_collected', 'value'])

chng = chng.fillna(last)
chng = chng[chng['value'] != 0].dropna()
代码语言:javascript
复制
# input data
+----------------+------------+-------+
| date_collected |    date    | value |
+----------------+------------+-------+
| 2020-11-06     | 2020-11-01 |     4 |
| 2020-11-06     | 2020-11-02 |     5 |
| 2020-11-06     | 2020-11-03 |     1 |
| 2020-11-06     | 2020-11-04 |     3 |
| 2020-11-06     | 2020-11-05 |     1 |
| 2020-11-09     | 2020-11-04 |     3 |
| 2020-11-09     | 2020-11-05 |     3 |
| 2020-11-09     | 2020-11-06 |     5 |
| 2020-11-09     | 2020-11-07 |     1 |
| 2020-11-09     | 2020-11-08 |     1 |
| 2020-11-10     | 2020-11-05 |     3 |
| 2020-11-10     | 2020-11-06 |     5 |
| 2020-11-10     | 2020-11-07 |     1 |
| 2020-11-10     | 2020-11-08 |     3 |
| 2020-11-10     | 2020-11-09 |     2 |
+----------------+------------+-------+

# wanted results
+----------------+------------+-------+
| date_collected |    date    | value |
+----------------+------------+-------+
| 2020-11-06     | 2020-11-05 |     1 |
| 2020-11-09     | 2020-11-05 |     2 |
| 2020-11-09     | 2020-11-06 |     5 |
| 2020-11-09     | 2020-11-07 |     1 |
| 2020-11-09     | 2020-11-08 |     1 |
| 2020-11-10     | 2020-11-08 |     2 |
| 2020-11-10     | 2020-11-09 |     2 |
+----------------+------------+-------+

# current results
+----------------+------------+-------+
| date_collected |    date    | value |
+----------------+------------+-------+
| 2020-11-06     | 2020-11-05 |     1 |
| 2020-11-09     | 2020-11-05 |     2 |
| 2020-11-09     | 2020-11-08 |     1 |
| 2020-11-10     | 2020-11-08 |     2 |
| 2020-11-10     | 2020-11-09 |     2 |
+----------------+------------+-------+
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-11-17 18:47:31

能够使用以下代码对多索引进行重新索引并获得所需的结果:

代码语言:javascript
复制
dates = pd.date_range(df['date'].min(), df['date'].max())
new_idx = pd.MultiIndex.from_product([df['date_collected'].unique(), dates])

df = df.set_index(['date_collected', 'date'])
df = df.reindex(new_idx).fillna(0)

chng = df.groupby(level=1).diff()
chng = chng[chng['value'] != 0].dropna()
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/64857309

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档