我想用几天的每小时数据来分析数据,例如df:
DATE TIME Threshold Value
2022-11-04 02:00:00 10 9
2022-11-04 03:00:00 11 10
2022-11-04 04:00:00 10 11
2022-11-04 06:00:00 12 11
2022-11-04 05:00:00 12 12
2022-11-04 07:00:00 10 11
2022-11-04 08:00:00 11 10
2022-11-04 09:00:00 11 9
2022-11-04 10:00:00 12 9
2022-11-04 11:00:00 10 10
2022-11-04 12:00:00 10 10
...
2022-11-05 01:00:00 10 9
2022-11-05 02:00:00 11 10
...现在,我想根据阈值/值和时间来检查数据。假设我对时间"08:00:00“的值感兴趣,如果上一次"04:00:00”的阈值是10。为了找到可能的模式,我可能还会在将来查看其他组合。
我的做法是:
这似乎有点麻烦,我想知道是否有更实际的方法来做到这一点。也许有人能提出一个更有效的方法?
发布于 2022-11-08 11:10:59
首先,让DatetimeInex:
date_idx=df.iloc[:, :2].astype('str').apply(lambda x: pd.to_datetime(' '.join(x)), axis=1)并使在4H之前具有阈值的new列
并将结果提交给df1
df1 = (df.set_index(date_idx)
.drop(['DATE', 'TIME'], axis=1)
.sort_index()
.assign(new=df1.shift(freq='4H')['Threshold']))输出(df1):
Threshold Value new
2022-11-04 02:00:00 10 9 NaN
2022-11-04 03:00:00 11 10 NaN
2022-11-04 04:00:00 10 11 NaN
2022-11-04 05:00:00 12 12 NaN
2022-11-04 06:00:00 12 11 10.0
2022-11-04 07:00:00 10 11 11.0
2022-11-04 08:00:00 11 10 10.0
2022-11-04 09:00:00 11 9 12.0
2022-11-04 10:00:00 12 9 12.0
2022-11-04 11:00:00 10 10 10.0
2022-11-04 12:00:00 10 10 11.0在08:00过滤数据:
df1.at_time('08:00')产出:
Threshold Value new
2022-11-04 08:00:00 11 10 10.0检查或筛选Value和new列
发布于 2022-11-08 14:58:41
这里有一种方法
out=(df.loc[
(df['TIME'].isin(['04:00:00','08:00:00']) & # choose rows where time is 4:00 or 8:00
df['DATE'].isin( # and date where
df.loc[df['TIME'].eq('04:00:00') & # time is 04:00:00
df['Threshold'].eq(10)]['DATE']) # and Threshold is 10
)])
out DATE TIME Threshold Value
2 2022-11-04 04:00:00 10 11
6 2022-11-04 08:00:00 11 10或者,与上述相同,只需选择时间eq到08:00:00。
out=(df.loc[
(df['TIME'].isin(['08:00:00']) &
df['DATE'].isin(
df.loc[df['TIME'].eq('04:00:00') &
df['Threshold'].eq(10)]['DATE'])
)])
out DATE TIME Threshold Value
6 2022-11-04 08:00:00 11 10https://stackoverflow.com/questions/74359356
复制相似问题