我有来自多个运动传感器的以下示例数据(multiple_sensors.csv):
sensorid,date_time,value
303,2012-06-25 11:15:35,0
404,2012-06-25 11:15:35,0
101,2012-06-25 11:15:35,0
202,2012-06-25 11:15:35,0
303,2012-06-25 11:15:36,0
404,2012-06-25 11:15:36,0
101,2012-06-25 11:15:36,0
202,2012-06-25 11:15:36,1
303,2012-06-25 11:15:37,0
404,2012-06-25 11:15:37,0
101,2012-06-25 11:15:37,0
202,2012-06-25 11:15:37,1
303,2012-06-25 11:15:38,0
404,2012-06-25 11:15:38,0
101,2012-06-25 11:15:38,0
202,2012-06-25 11:15:38,0
303,2012-06-25 11:15:39,0
404,2012-06-25 11:15:39,1
101,2012-06-25 11:15:39,0
202,2012-06-25 11:15:39,0
303,2012-06-25 11:15:40,0
404,2012-06-25 11:15:40,1
101,2012-06-25 11:15:40,0
202,2012-06-25 11:15:40,0
303,2012-06-25 11:15:41,1
404,2012-06-25 11:15:41,0
101,2012-06-25 11:15:41,0
202,2012-06-25 11:15:41,0
303,2012-06-25 11:15:42,1
404,2012-06-25 11:15:42,0
101,2012-06-25 11:15:42,0
202,2012-06-25 11:15:42,0
303,2012-06-25 11:15:43,1
404,2012-06-25 11:15:43,0
101,2012-06-25 11:15:43,0
202,2012-06-25 11:15:43,0
303,2012-06-25 11:15:44,0我需要返回每个运动传感器事件的、id、和持续时间(请参见output.png)。值列确定运动是否被触发(1 -表示触发的运动,0-表示无运动),date_time列指示运动何时开始或结束。
现在,我设法使用下面的单个运动传感器(single_sensor.csv) (参见output.png)提取id和持续时间。
sensorid,date_time,value
202,2012-06-25 00:01:07,0
202,2012-06-25 00:01:08,1
202,2012-06-25 00:01:09,1
202,2012-06-25 00:01:10,0
202,2012-06-25 00:02:12,0
202,2012-06-25 00:02:13,1
202,2012-06-25 00:02:14,1
202,2012-06-25 00:02:15,1
202,2012-06-25 00:02:16,0
202,2012-06-25 00:03:40,0
202,2012-06-25 00:03:41,1
202,2012-06-25 00:03:42,1
202,2012-06-25 00:03:43,1
202,2012-06-25 00:03:44,0
202,2012-06-25 00:05:11,0
202,2012-06-25 00:05:12,1
202,2012-06-25 00:05:13,1
202,2012-06-25 00:05:14,0
202,2012-06-25 00:06:19,0
202,2012-06-25 00:06:20,1
202,2012-06-25 00:06:21,1
202,2012-06-25 00:06:22,0对于涉及单个传感器的代码,我遵循这里的示例(计算与熊猫之间事件的持续时间)
import pandas as pd
import numpy as np
from pandas import read_csv
from datetime import datetime
from datetime import timedelta
data_time_format = '%Y-%m-%d %H:%M:%S'
df = read_csv('single_sensor.csv')
df['date_time'] = pd.to_datetime(df['date_time'], format=data_time_format)
a = (df['value'] != 1).cumsum().mask(df['value'] == 1)
df['value group'] = a.bfill()
df_final = df.groupby('value group').filter(lambda x: set(x['value']) == set([1,0]))\
.groupby('value group')['date_time'].agg(['first','last'])\
.rename(columns={'first':'start','last':'end'})\
.reset_index()
df_final['id'] = df['sensorid']
df_final['duration'] = df_final['end'].values - df_final['start']
df_final['duration'] = df_final['duration'].dt.total_seconds().astype(int)
print(df_final)如何使用multiple_sensors.csv扩展它以实现我的预期输出
发布于 2020-07-02 04:01:22
IIUC,
让我们试试这个:
def f(df):
a = (df['value'] != 1).cumsum().mask(df['value'] == 1)
df['value group'] = a.bfill()
df_final = df.groupby('value group').filter(lambda x: set(x['value']) == set([1,0]))\
.groupby('value group')['date_time'].agg(['first','last'])\
.rename(columns={'first':'start','last':'end'})\
.reset_index()
if df_final.shape[0] == 0:
return
df_final['id'] = df['sensorid']
df_final['duration'] = df_final['end'].values - df_final['start']
df_final['duration'] = df_final['duration'].dt.total_seconds().astype(int)
return df_final
df_out = df.groupby('sensorid').apply(f).reset_index().drop(['level_1', 'value group', 'id'], axis=1)
df_out = df_out.sort_values('start')
df_out输出:
sensorid start end duration
0 202 2012-06-25 11:15:36 2012-06-25 11:15:38 2
1 303 2012-06-25 11:15:41 2012-06-25 11:15:44 3
2 404 2012-06-25 11:15:39 2012-06-25 11:15:41 2注意:这可能需要一个更健壮的测试用例。但是,在groupby 'sensorid‘调用的自定义函数中使用前面的逻辑。
发布于 2020-07-02 02:47:27
对于一个传感器:
import pandas as pd
df = pd.read_csv('single_censor.csv')
df['date_time'] = pd.to_datetime(df['date_time'])
# Assume that your data format first value=0 ignore, start value=1 end value=0
selected_rows = df['value'] != df['value'].shift(1)
selected_rows[0] = False
df2 = df[selected_rows].copy()
df2['start'] = df2['date_time']
df2['end'] = df2['date_time'].shift(-1)
df2.drop(['date_time'], axis=1, inplace=True)
df3 = df2[df2['value'] == 1].copy()
df3['duration'] = df3['end'] - df3['start']
df3.drop('value', axis=1, inplace=True)输出
sensorid start end duration
1 202 2012-06-25 00:01:08 2012-06-25 00:01:10 00:00:02
5 202 2012-06-25 00:02:13 2012-06-25 00:02:16 00:00:03
10 202 2012-06-25 00:03:41 2012-06-25 00:03:44 00:00:03
15 202 2012-06-25 00:05:12 2012-06-25 00:05:14 00:00:02
19 202 2012-06-25 00:06:20 2012-06-25 00:06:22 00:00:02多传感器:
import pandas as pd
df = pd.read_csv('multiple_sensors.csv')
df['date_time'] = pd.to_datetime(df['date_time'])
df2 = df.sort_values(['sensorid', 'date_time'])
selected_rows = df2['value'] != df2['value'].shift(1)
selected_rows[0] = False
df3 = df2[selected_rows].copy()
df3['start'] = df3['date_time']
df3['end'] = df3['date_time'].shift(-1)
df3.drop(['date_time'], axis=1, inplace=True)
df4 = df3[df3['value'] == 1].copy()
df4['duration'] = df4['end'] - df4['start']
df4.drop('value', axis=1, inplace=True)
df4.sort_values('start') 输出
sensorid start end duration
7 202 2012-06-25 11:15:36 2012-06-25 11:15:38 00:00:02
17 404 2012-06-25 11:15:39 2012-06-25 11:15:41 00:00:02
24 303 2012-06-25 11:15:41 2012-06-25 11:15:44 00:00:03消除重叠时间:
data = [
(202, pd.to_datetime('2012-06-25 00:11:47'),
pd.to_datetime('2012-06-25 00:11:49'), 2),
(404, pd.to_datetime('2012-06-25 00:11:48'),
pd.to_datetime('2012-06-25 00:11:50'), 2)
]
df = pd.DataFrame(data, columns=['sensor_id', 'start', 'end', 'duration'])
df['end_shift'] = df['end'].shift().fillna(pd.to_datetime('1971-01-01'))
df.loc[0, 'end_shift'] = pd.to_datetime('1971-01-01')
df[df['start'] >= df['end_shift']].drop('end_shift', axis=1)输出
sensor_id start end duration
0 202 2012-06-25 00:11:47 2012-06-25 00:11:49 2团体持续时间:
data = [
(202, pd.to_datetime('2020-06-25 00:11:43'), pd.to_datetime('2020-06-25 00:11:45'),2),
(202, pd.to_datetime('2020-06-25 00:11:47'), pd.to_datetime('2020-06-25 00:11:49'),2),
(404, pd.to_datetime('2020-06-25 00:11:51'), pd.to_datetime('2020-06-25 00:11:54'),3),
(404, pd.to_datetime('2020-06-25 00:11:55'), pd.to_datetime('2020-06-25 00:11:57'),2),
(202, pd.to_datetime('2020-06-25 00:11:58'), pd.to_datetime('2020-06-25 00:12:01'),3),
(202, pd.to_datetime('2020-06-25 00:12:18'), pd.to_datetime('2020-06-25 00:12:21'),3),
(101, pd.to_datetime('2020-06-25 00:12:21'), pd.to_datetime('2020-06-25 00:12:23'),2),
(101, pd.to_datetime('2020-06-25 00:12:32'), pd.to_datetime('2020-06-25 00:12:34'),2),
]
df=pd.DataFrame(data, columns=['sensor_id', 'start', 'end', 'duration'])
df['id'] = df['sensor_id'].shift(-1)
df['cumsum'] = df['duration'].cumsum()
df2 = df[df['id'] != df['sensor_id']].copy()
df2['duration2'] = df2['cumsum'] - df2['cumsum'].shift().fillna(0)
df2[['sensor_id', 'duration2']]输出
sensor_id duration2
1 202 4.0
3 404 5.0
5 202 6.0
7 101 4.0要求从一开始就不清楚。所有原始计算的持续时间都被丢弃,新的持续时间被重新计算。如果要求是明确的,那就更好了。解决方案将被缩短。
data = [
(202, pd.to_datetime('2020-06-25 00:11:43'), pd.to_datetime('2020-06-25 00:11:45'),2),
(202, pd.to_datetime('2020-06-25 00:11:47'), pd.to_datetime('2020-06-25 00:11:49'),2),
(404, pd.to_datetime('2020-06-25 00:11:51'), pd.to_datetime('2020-06-25 00:11:54'),3),
(404, pd.to_datetime('2020-06-25 00:11:55'), pd.to_datetime('2020-06-25 00:11:57'),2),
(202, pd.to_datetime('2020-06-25 00:11:58'), pd.to_datetime('2020-06-25 00:12:01'),3),
(202, pd.to_datetime('2020-06-25 00:12:18'), pd.to_datetime('2020-06-25 00:12:21'),3),
(101, pd.to_datetime('2020-06-25 00:12:21'), pd.to_datetime('2020-06-25 00:12:23'),2),
(101, pd.to_datetime('2020-06-25 00:12:32'), pd.to_datetime('2020-06-25 00:12:34'),2),
]
df=pd.DataFrame(data, columns=['sensor_id', 'start', 'end', 'duration'])
df['id1'] = df['sensor_id'].shift(-1)
df['id2'] = df['sensor_id'].shift(1)
df2 = df[df['id1'] != df['sensor_id']].copy().reset_index()
df2['start'] = df[df['id2'] != df['sensor_id']].reset_index()['start']
df2['duration'] = df2['end'] - df2['start']
df2.drop(['id1', 'id2'], axis=1, inplace=True)
df2输出
index sensor_id start end duration
0 1 202 2020-06-25 00:11:43 2020-06-25 00:11:49 00:00:06
1 3 404 2020-06-25 00:11:51 2020-06-25 00:11:57 00:00:06
2 5 202 2020-06-25 00:11:58 2020-06-25 00:12:21 00:00:23
3 7 101 2020-06-25 00:12:21 2020-06-25 00:12:34 00:00:13https://stackoverflow.com/questions/62687886
复制相似问题