数据文件需要从特定的日子开始清洗。要移除的日期是这样选择的:
df['_datetime'] = df.index
exclude_holidays = df.groupby(df.index.floor('d'))._datetime.last()
exclude_holidays.loc[exclude_holidays.dt.hour < 14]输出exclude_holidays:
datetime
2020-12-24 2020-12-24 12:07:12
2021-01-18 2021-01-18 11:52:57
2021-02-15 2021-02-15 08:46:44
2021-05-31 2021-05-31 11:29:36
2021-07-05 2021-07-05 11:56:05
2021-09-06 2021-09-06 11:33:40
2021-11-25 2021-11-25 11:59:37
2021-11-26 2021-11-26 12:14:51
2022-01-17 2022-01-17 11:59:38
2022-02-21 2022-02-21 11:59:42
2022-05-30 2022-05-30 11:59:56
2022-06-20 2022-06-20 11:59:53
2022-07-04 2022-07-04 11:41:38
2022-09-05 2022-09-05 11:59:30
Name: _datetime, dtype: datetime64[ns]现在,我如何将这些天从dataframe中删除?
我试过这个:
df = df.drop(df.loc[df.index.normalize() == exclude_holidays].index, axis=0)..throws此错误:
ValueError: Lengths must match to compare通过这一尝试:
df = df.drop(df.loc[exclude_holidays].index, axis=0)没有错误,但是白天没有被移除。
这是数据文件:
Open High Low Close
datetime
2020-12-17 08:30:00 3686.00 3687.50 3686.00 3687.50
2020-12-17 08:30:03 3687.75 3689.00 3687.50 3689.00
2020-12-17 08:31:17 3689.25 3690.50 3689.00 3690.50
2020-12-17 08:32:36 3690.75 3689.00 3687.50 3687.50
2020-12-17 08:43:12 3687.25 3687.50 3686.00 3686.00
... ... ... ... ...
2022-11-11 14:57:30 3998.00 4001.25 3999.75 4001.25
2022-11-11 14:59:40 4001.50 3999.75 3998.25 3999.75
2022-11-11 14:59:59 4000.00 4001.25 3999.75 4001.25
2022-11-11 14:59:59 4001.50 4002.75 4001.25 4002.75
2022-11-11 15:00:09 4003.00 4001.25 3999.75 3999.75我怎样才能从数据文件中删除这些日子?
发布于 2022-11-13 11:10:19
你可以这样做:
exclude_holidays = exclude_holidays.index.values
df = df[~df.index.isin(exclude_holidays)]https://stackoverflow.com/questions/74420307
复制相似问题