我正试图写一个程序,其中涉及到监测电力消耗,以瓦特为一所房子。我有两套数据。第一个(df_appliances)包含多个设备列,每10分钟就记录它们的瓦特值,如下所示:
Timestamp Dishwasher TV Desktop_PC Washer/Dryer AC_unit Fridge
10/28/2022 9:00 0 60 0 0 2500 250
10/28/2022 9:10 0 60 0 0 2500 250
10/28/2022 9:20 0 60 0 0 2500 250
10/28/2022 9:30 0 0 0 0 2500 250
10/28/2022 9:40 1800 0 0 0 2500 250
10/28/2022 9:50 1800 0 0 0 2500 250
10/28/2022 10:00 1800 0 0 0 2500 250
10/28/2022 10:10 1800 60 0 0 2500 250
10/28/2022 10:20 1800 60 0 0 2500 250
10/28/2022 10:30 1800 60 0 0 2500 250
10/28/2022 10:40 0 0 0 0 0 250
10/28/2022 10:50 0 0 0 0 0 250
10/28/2022 11:00 0 0 300 0 0 250
10/28/2022 11:10 0 0 300 0 0 250
10/28/2022 11:20 0 0 300 0 0 250
10/28/2022 11:30 0 60 300 0 0 250
10/28/2022 11:40 0 60 300 0 0 250
10/28/2022 11:50 0 60 300 0 2500 250
10/28/2022 12:00 0 60 300 3000 2500 250
10/28/2022 12:10 0 60 300 3000 2500 250
10/28/2022 12:20 0 0 300 3000 2500 250
10/28/2022 12:30 0 0 0 3000 2500 250
10/28/2022 12:40 0 0 0 3000 2500 250
10/28/2022 12:50 0 0 0 3000 2500 250
10/28/2022 13:00 0 0 0 3000 2500 250类似地,我从另一个不同的源获得另一个dataframe (df_total),它包含总瓦特数的总和,如下所示:
DATETIME Total_Watts
10/28/2022 9:00 2810
10/28/2022 9:10 2810
10/28/2022 9:20 2810
10/28/2022 9:30 2750
10/28/2022 9:40 4550
10/28/2022 9:50 4550
10/28/2022 10:00 4550
10/28/2022 10:10 4610
10/28/2022 10:20 4610
10/28/2022 10:30 4610
10/28/2022 10:40 250
10/28/2022 10:50 250
10/28/2022 11:00 550
10/28/2022 11:10 550
10/28/2022 11:20 550
10/28/2022 11:30 610
10/28/2022 11:40 610
10/28/2022 11:50 3110
10/28/2022 12:00 6110
10/28/2022 12:10 6110
10/28/2022 12:20 6050
10/28/2022 12:30 5750
10/28/2022 12:40 5750
10/28/2022 12:50 5750
10/28/2022 13:00 5750目前的任务:
df_output),它是df_appliances的过滤版本,df_total中的4000瓦特阈值是一个基于时间的过滤器,程序在阈值启动时间前20分钟记录数据。我能够通过以下代码行完成第一项任务(我也希望得到一个更优雅的函数形式的解决方案):
start_time = df_total['DATETIME'][df_total['Total_Watts'] >= 4000].tolist()
df_output = df_appliances[df_appliances['Timestamp'].isin(start_time)]基本上,我的输出应该如下所示:
Timestamp Dishwasher TV Desktop_PC Washer/Dryer AC_unit Fridge
10/28/2022 9:20 0 60 0 0 2500 250
10/28/2022 9:30 0 0 0 0 2500 250
10/28/2022 9:40 1800 0 0 0 2500 250
10/28/2022 9:50 1800 0 0 0 2500 250
10/28/2022 10:00 1800 0 0 0 2500 250
10/28/2022 10:10 1800 60 0 0 2500 250
10/28/2022 10:20 1800 60 0 0 2500 250
10/28/2022 10:30 1800 60 0 0 2500 250
10/28/2022 11:40 0 60 300 0 0 250
10/28/2022 11:50 0 60 300 0 2500 250
10/28/2022 12:00 0 60 300 3000 2500 250
10/28/2022 12:10 0 60 300 3000 2500 250
10/28/2022 12:20 0 0 300 3000 2500 250
10/28/2022 12:30 0 0 0 3000 2500 250
10/28/2022 12:40 0 0 0 3000 2500 250
10/28/2022 12:50 0 0 0 3000 2500 250
10/28/2022 13:00 0 0 0 3000 2500 250发布于 2022-10-28 22:22:30
一种方法是合并,然后根据你的标准进行过滤。
df = df_appliances.merge(df_total, left_on="Timestamp", right_on="DATETIME")
print(df[(df.Total_Watts >= 4000) | (df.Total_Watts.shift(-1) >= 4000) | (df.Total_Watts.shift(-2) >= 4000)])编辑:
df.Timestamp = pd.to_datetime(df.Timestamp)
start_times = df[df.Total_Watts >= 4000].Timestamp.to_list()
df[df.apply(lambda row: any(val in pd.Interval(row.Timestamp - pd.Timedelta(microseconds=1),
row.Timestamp + pd.Timedelta(minutes=20)) for val in start_times), axis=1)]https://stackoverflow.com/questions/74240708
复制相似问题