首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python中datetime列中多个事件的持续时间

Python中datetime列中多个事件的持续时间
EN

Stack Overflow用户
提问于 2020-07-02 02:09:50
回答 2查看 266关注 0票数 2

我有来自多个运动传感器的以下示例数据(multiple_sensors.csv):

代码语言:javascript
复制
sensorid,date_time,value
303,2012-06-25 11:15:35,0
404,2012-06-25 11:15:35,0
101,2012-06-25 11:15:35,0
202,2012-06-25 11:15:35,0
303,2012-06-25 11:15:36,0
404,2012-06-25 11:15:36,0
101,2012-06-25 11:15:36,0
202,2012-06-25 11:15:36,1
303,2012-06-25 11:15:37,0
404,2012-06-25 11:15:37,0
101,2012-06-25 11:15:37,0
202,2012-06-25 11:15:37,1
303,2012-06-25 11:15:38,0
404,2012-06-25 11:15:38,0
101,2012-06-25 11:15:38,0
202,2012-06-25 11:15:38,0
303,2012-06-25 11:15:39,0
404,2012-06-25 11:15:39,1
101,2012-06-25 11:15:39,0
202,2012-06-25 11:15:39,0
303,2012-06-25 11:15:40,0
404,2012-06-25 11:15:40,1
101,2012-06-25 11:15:40,0
202,2012-06-25 11:15:40,0
303,2012-06-25 11:15:41,1
404,2012-06-25 11:15:41,0
101,2012-06-25 11:15:41,0
202,2012-06-25 11:15:41,0
303,2012-06-25 11:15:42,1
404,2012-06-25 11:15:42,0
101,2012-06-25 11:15:42,0
202,2012-06-25 11:15:42,0
303,2012-06-25 11:15:43,1
404,2012-06-25 11:15:43,0
101,2012-06-25 11:15:43,0
202,2012-06-25 11:15:43,0
303,2012-06-25 11:15:44,0

我需要返回每个运动传感器事件的、id、持续时间(请参见output.png)。列确定运动是否被触发(1 -表示触发的运动,0-表示无运动),date_time列指示运动何时开始或结束。

现在,我设法使用下面的单个运动传感器(single_sensor.csv) (参见output.png)提取id和持续时间。

代码语言:javascript
复制
sensorid,date_time,value
202,2012-06-25 00:01:07,0
202,2012-06-25 00:01:08,1
202,2012-06-25 00:01:09,1
202,2012-06-25 00:01:10,0
202,2012-06-25 00:02:12,0
202,2012-06-25 00:02:13,1
202,2012-06-25 00:02:14,1
202,2012-06-25 00:02:15,1
202,2012-06-25 00:02:16,0
202,2012-06-25 00:03:40,0
202,2012-06-25 00:03:41,1
202,2012-06-25 00:03:42,1
202,2012-06-25 00:03:43,1
202,2012-06-25 00:03:44,0
202,2012-06-25 00:05:11,0
202,2012-06-25 00:05:12,1
202,2012-06-25 00:05:13,1
202,2012-06-25 00:05:14,0
202,2012-06-25 00:06:19,0
202,2012-06-25 00:06:20,1
202,2012-06-25 00:06:21,1
202,2012-06-25 00:06:22,0

对于涉及单个传感器的代码,我遵循这里的示例(计算与熊猫之间事件的持续时间)

代码语言:javascript
复制
import pandas as pd
import numpy as np
from pandas import read_csv
from datetime import datetime
from datetime import timedelta

data_time_format = '%Y-%m-%d %H:%M:%S'

df = read_csv('single_sensor.csv')
df['date_time'] = pd.to_datetime(df['date_time'], format=data_time_format)

a = (df['value'] != 1).cumsum().mask(df['value'] == 1)
df['value group'] = a.bfill()

df_final = df.groupby('value group').filter(lambda x: set(x['value']) == set([1,0]))\
           .groupby('value group')['date_time'].agg(['first','last'])\
           .rename(columns={'first':'start','last':'end'})\
           .reset_index()

df_final['id'] = df['sensorid']
df_final['duration'] = df_final['end'].values - df_final['start']
df_final['duration'] = df_final['duration'].dt.total_seconds().astype(int)
print(df_final)

如何使用multiple_sensors.csv扩展它以实现我的预期输出

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-07-02 04:01:22

IIUC,

让我们试试这个:

代码语言:javascript
复制
def f(df):
    a = (df['value'] != 1).cumsum().mask(df['value'] == 1)
    df['value group'] = a.bfill()

    df_final = df.groupby('value group').filter(lambda x: set(x['value']) == set([1,0]))\
           .groupby('value group')['date_time'].agg(['first','last'])\
           .rename(columns={'first':'start','last':'end'})\
           .reset_index()
    if df_final.shape[0] == 0:
        return
    df_final['id'] = df['sensorid']
    df_final['duration'] = df_final['end'].values - df_final['start']
    df_final['duration'] = df_final['duration'].dt.total_seconds().astype(int)
    return df_final

df_out = df.groupby('sensorid').apply(f).reset_index().drop(['level_1', 'value group', 'id'], axis=1)
df_out = df_out.sort_values('start')
df_out

输出:

代码语言:javascript
复制
   sensorid               start                 end  duration
0       202 2012-06-25 11:15:36 2012-06-25 11:15:38         2
1       303 2012-06-25 11:15:41 2012-06-25 11:15:44         3
2       404 2012-06-25 11:15:39 2012-06-25 11:15:41         2

注意:这可能需要一个更健壮的测试用例。但是,在groupby 'sensorid‘调用的自定义函数中使用前面的逻辑。

票数 0
EN

Stack Overflow用户

发布于 2020-07-02 02:47:27

对于一个传感器:

代码语言:javascript
复制
import pandas as pd
df = pd.read_csv('single_censor.csv')
df['date_time'] = pd.to_datetime(df['date_time'])

# Assume that your data format first value=0 ignore, start value=1 end value=0
selected_rows = df['value'] != df['value'].shift(1)
selected_rows[0] = False

df2 = df[selected_rows].copy()

df2['start'] = df2['date_time']
df2['end'] = df2['date_time'].shift(-1)
df2.drop(['date_time'], axis=1, inplace=True)

df3 = df2[df2['value'] == 1].copy()

df3['duration'] = df3['end'] - df3['start']
df3.drop('value', axis=1, inplace=True)

输出

代码语言:javascript
复制
    sensorid    start   end duration
1   202 2012-06-25 00:01:08 2012-06-25 00:01:10 00:00:02
5   202 2012-06-25 00:02:13 2012-06-25 00:02:16 00:00:03
10  202 2012-06-25 00:03:41 2012-06-25 00:03:44 00:00:03
15  202 2012-06-25 00:05:12 2012-06-25 00:05:14 00:00:02
19  202 2012-06-25 00:06:20 2012-06-25 00:06:22 00:00:02

多传感器:

代码语言:javascript
复制
import pandas as pd
df = pd.read_csv('multiple_sensors.csv')
df['date_time'] = pd.to_datetime(df['date_time'])
df2 = df.sort_values(['sensorid', 'date_time'])

selected_rows = df2['value'] != df2['value'].shift(1)
selected_rows[0] = False

df3 = df2[selected_rows].copy()
df3['start'] = df3['date_time']
df3['end'] = df3['date_time'].shift(-1)
df3.drop(['date_time'], axis=1, inplace=True)

df4 = df3[df3['value'] == 1].copy()
df4['duration'] = df4['end'] - df4['start']
df4.drop('value', axis=1, inplace=True)
df4.sort_values('start') 

输出

代码语言:javascript
复制
    sensorid               start                 end duration
7        202 2012-06-25 11:15:36 2012-06-25 11:15:38 00:00:02
17       404 2012-06-25 11:15:39 2012-06-25 11:15:41 00:00:02
24       303 2012-06-25 11:15:41 2012-06-25 11:15:44 00:00:03

消除重叠时间:

代码语言:javascript
复制
data = [
    (202, pd.to_datetime('2012-06-25 00:11:47'),
     pd.to_datetime('2012-06-25 00:11:49'), 2),
    (404, pd.to_datetime('2012-06-25 00:11:48'),
     pd.to_datetime('2012-06-25 00:11:50'), 2)
]
df = pd.DataFrame(data, columns=['sensor_id', 'start', 'end', 'duration'])

df['end_shift'] = df['end'].shift().fillna(pd.to_datetime('1971-01-01'))
df.loc[0, 'end_shift'] = pd.to_datetime('1971-01-01')
df[df['start'] >= df['end_shift']].drop('end_shift', axis=1)

输出

代码语言:javascript
复制
   sensor_id               start                 end  duration
0        202 2012-06-25 00:11:47 2012-06-25 00:11:49         2

团体持续时间:

代码语言:javascript
复制
data = [
(202, pd.to_datetime('2020-06-25 00:11:43'), pd.to_datetime('2020-06-25 00:11:45'),2), 
(202, pd.to_datetime('2020-06-25 00:11:47'), pd.to_datetime('2020-06-25 00:11:49'),2),
(404, pd.to_datetime('2020-06-25 00:11:51'), pd.to_datetime('2020-06-25 00:11:54'),3),
(404, pd.to_datetime('2020-06-25 00:11:55'), pd.to_datetime('2020-06-25 00:11:57'),2),
(202, pd.to_datetime('2020-06-25 00:11:58'), pd.to_datetime('2020-06-25 00:12:01'),3),
(202, pd.to_datetime('2020-06-25 00:12:18'), pd.to_datetime('2020-06-25 00:12:21'),3),
(101, pd.to_datetime('2020-06-25 00:12:21'), pd.to_datetime('2020-06-25 00:12:23'),2),
(101, pd.to_datetime('2020-06-25 00:12:32'), pd.to_datetime('2020-06-25 00:12:34'),2),
]
df=pd.DataFrame(data, columns=['sensor_id', 'start', 'end', 'duration'])

df['id'] = df['sensor_id'].shift(-1)
df['cumsum'] = df['duration'].cumsum()
df2 = df[df['id'] != df['sensor_id']].copy()
df2['duration2'] = df2['cumsum'] - df2['cumsum'].shift().fillna(0)
df2[['sensor_id', 'duration2']]

输出

代码语言:javascript
复制
   sensor_id  duration2
1        202        4.0
3        404        5.0
5        202        6.0
7        101        4.0

要求从一开始就不清楚。所有原始计算的持续时间都被丢弃,新的持续时间被重新计算。如果要求是明确的,那就更好了。解决方案将被缩短。

代码语言:javascript
复制
data = [
(202, pd.to_datetime('2020-06-25 00:11:43'), pd.to_datetime('2020-06-25 00:11:45'),2), 
(202, pd.to_datetime('2020-06-25 00:11:47'), pd.to_datetime('2020-06-25 00:11:49'),2),
(404, pd.to_datetime('2020-06-25 00:11:51'), pd.to_datetime('2020-06-25 00:11:54'),3),
(404, pd.to_datetime('2020-06-25 00:11:55'), pd.to_datetime('2020-06-25 00:11:57'),2),
(202, pd.to_datetime('2020-06-25 00:11:58'), pd.to_datetime('2020-06-25 00:12:01'),3),
(202, pd.to_datetime('2020-06-25 00:12:18'), pd.to_datetime('2020-06-25 00:12:21'),3),
(101, pd.to_datetime('2020-06-25 00:12:21'), pd.to_datetime('2020-06-25 00:12:23'),2),
(101, pd.to_datetime('2020-06-25 00:12:32'), pd.to_datetime('2020-06-25 00:12:34'),2),
]
df=pd.DataFrame(data, columns=['sensor_id', 'start', 'end', 'duration'])

df['id1'] = df['sensor_id'].shift(-1)
df['id2'] = df['sensor_id'].shift(1)

df2 = df[df['id1'] != df['sensor_id']].copy().reset_index()
df2['start'] = df[df['id2'] != df['sensor_id']].reset_index()['start']

df2['duration'] = df2['end'] - df2['start']
df2.drop(['id1', 'id2'], axis=1, inplace=True) 
df2

输出

代码语言:javascript
复制
   index  sensor_id               start                 end duration
0      1        202 2020-06-25 00:11:43 2020-06-25 00:11:49 00:00:06
1      3        404 2020-06-25 00:11:51 2020-06-25 00:11:57 00:00:06
2      5        202 2020-06-25 00:11:58 2020-06-25 00:12:21 00:00:23
3      7        101 2020-06-25 00:12:21 2020-06-25 00:12:34 00:00:13
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62687886

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档