文章/答案/技术大牛

发布

社区首页 >问答首页 >大熊猫重采样时如何处理时间序列的结束？

问大熊猫重采样时如何处理时间序列的结束？
EN

Stack Overflow用户

提问于 2018-10-10 00:57:52

回答 1查看 427关注 0票数 3

我想把时间重新安排到半个小时。我在示例中使用了.ffill()，但我也测试了.asfreq()作为中间步骤。

目标是得到半个小时的间隔，其中每小时的值分布在上采样间隔之间，我试图为任何有相同问题的范围找到一个通用的解决方案。

import pandas as pd

index = pd.date_range('2018-10-10 00:00', '2018-10-10 02:00', freq='H')
hourly = pd.Series(range(10, len(index)+10), index=index)
half_hourly = hourly.resample('30min').ffill() / 2

hourly系列如下所示：

2018-10-10 00:00:00    10
2018-10-10 01:00:00    11
2018-10-10 02:00:00    12
Freq: H, dtype: int64

和half_hourly

2018-10-10 00:00:00    5.0
2018-10-10 00:30:00    5.0
2018-10-10 01:00:00    5.5
2018-10-10 01:30:00    5.5
2018-10-10 02:00:00    6.0
Freq: 30T, dtype: float64

最后一个问题是没有用于表示02:30:00的行

我想要取得的成果是：

2018-10-10 00:00:00    5.0
2018-10-10 00:30:00    5.0
2018-10-10 01:00:00    5.5
2018-10-10 01:30:00    5.5
2018-10-10 02:00:00    6.0
2018-10-10 02:30:00    6.0
Freq: 30T, dtype: float64

据我所知，hourly系列节目将在02:00结束，因此没有理由期望熊猫在默认的情况下插入最后半小时。然而，在阅读了许多废弃的/旧的帖子，一些更新的文章，文档和烹饪书之后，我仍然无法找到一个直截了当的解决方案。

最后，我还测试了.mean()的使用情况，但这并没有填补. And interpolate()不按小时平均使用的情况。

在这种情况下，我的.ffill() / 2几乎可以作为一种将时间延长到半小时的方法，但它似乎是对一个问题的黑客攻击，我认为熊猫已经为这个问题提供了更好的解决方案。

提前谢谢。

resampling

python

pandas

time-series

date-range

回答 1

Stack Overflow用户

发布于 2020-10-16 22:24:45

你的问题可以这样解决

>>> import pandas as pd
>>> index = pd.date_range('2018-10-10 00:00', '2018-10-10 02:00', freq='H')
>>> hourly = pd.Series(range(10, len(index)+10), index=index)
>>> hourly.reindex(index.union(index.shift(freq='30min'))).ffill() / 2
2018-10-10 00:00:00    5.0
2018-10-10 00:30:00    5.0
2018-10-10 01:00:00    5.5
2018-10-10 01:30:00    5.5
2018-10-10 02:00:00    6.0
2018-10-10 02:30:00    6.0
Freq: 30T, dtype: float64

>>> import pandas as pd
>>> index = pd.date_range('2018-10-10 00:00', '2018-10-10 02:00', freq='H')
>>> hourly = pd.Series(range(10, len(index)+10), index=index)
>>> hourly.reindex(index.union(index.shift(freq='30min'))).ffill() / 2

我怀疑这是一个很小的例子，所以我也会尝试泛化解决。假设你每天都有多个积分要填写

>>> import pandas as pd
>>> x = pd.Series([1.5, 2.5], pd.DatetimeIndex(['2018-09-21', '2018-09-22']))
>>> x.resample('6h').ffill()
2018-09-21 00:00:00    1.5
2018-09-21 06:00:00    1.5
2018-09-21 12:00:00    1.5
2018-09-21 18:00:00    1.5
2018-09-22 00:00:00    2.5
Freq: 6H, dtype: float64

使用类似的技巧，包括2018-09-22上午6:00，12:00，6:00。

使用相等于希望将其作为包含性端点的移位的重新索引。在这种情况下，我们的轮班是额外的一天

>>> import pandas as pd
>>> x = pd.Series([1.5, 2.5], pd.DatetimeIndex(['2018-09-21', '2018-09-22']))
>>> res = x.reindex(x.index.union(x.index.shift(freq='1D'))).resample('6h').ffill()
>>> res[:res.last_valid_index()]  # drop the start of next day
2018-09-21 00:00:00    1.5
2018-09-21 06:00:00    1.5
2018-09-21 12:00:00    1.5
2018-09-21 18:00:00    1.5
2018-09-22 00:00:00    2.5
2018-09-22 06:00:00    2.5
2018-09-22 12:00:00    2.5
2018-09-22 18:00:00    2.5
Freq: 6H, dtype: float64

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/52731191

复制

相似问题

问大熊猫重采样时如何处理时间序列的结束？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问大熊猫重采样时如何处理时间序列的结束？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问大熊猫重采样时如何处理时间序列的结束？
EN