首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >熊猫:如何处理特定数据栏的重叠时间?

熊猫:如何处理特定数据栏的重叠时间?
EN

Stack Overflow用户
提问于 2021-01-20 14:52:06
回答 1查看 152关注 0票数 1

我正试图用一个数据文件来处理重叠的时标,我发现很难继续下去。我有一个数据格式,它的值如下所示。我的数据中大约有40万行。有谁能帮我了解一下我能用什么功能来获得结果(对于3种用例)?

非常感谢您的帮助,提前。

我有4个用例:

1.例1:告警1没有EndTime,但是由于Alarm 1、2和3有重叠时间,所以我需要最低的启动时间和最大的开始时间(在这里,在这里,它没有结束)

代码语言:javascript
复制
   |------ALARM1--------- (no end)
    |----ALARM2-----|
|-------ALARM3--|      

<--------------------------...


    

   ELEMENT                                      TEXT                                       ID         START                       END

   OLT5134-LT4-PON0-ONT25                         ALARM1                                    5        2021-01-19 05:00:00                 NaT
   OLT5134-LT4-PON0-ONT25                         ALARM2                                 772874243   2021-01-20 06:00:00     2021-01-20 08:00:00
   OLT5134-LT4-PON0-ONT25                         ALARM3                                 772874243   2021-01-20 04:00:19     2021-01-20 10:03:16
   
   Result 
   OLT5134-LT4-PON0-ONT25                         ALARM3                                    5        2021-01-19 04:00:00                 NaT

2.案例2: Alarm2完全包含在Alarm1

代码语言:javascript
复制
|-----ALARM1----|
   |---ALARM2-| 

<--------------->   

   ELEMENT                                      TEXT                                       ID         START                       END

   OLT5134-LT4-PON0-ONT26                         ALARM1                                    5        2021-01-19 05:00:00     2021-01-20 10:00:00
   OLT5134-LT4-PON0-ONT26                         ALARM2                                 772874243   2021-01-20 06:00:00     2021-01-20 08:00:00

   
   Result 
   OLT5134-LT4-PON0-ONT26                         ALARM1                                    5        2021-01-19 05:00:00     2021-01-20 10:00:00

3.案例3: Alarm2和Alarm1有一个共同的重叠时间

代码语言:javascript
复制
|-----ALARM1----|
           |---ALARM2-| 
<--------------------->

   ELEMENT                                      TEXT                                       ID         START                       END

   OLT5134-LT4-PON0-ONT27                         ALARM1                                    5        2021-01-17 05:00:00     2021-01-19 10:00:00
   OLT5134-LT4-PON0-ONT27                         ALARM2                                 772874243   2021-01-18 06:00:00     2021-01-20 08:00:00

   
   Result 
   OLT5134-LT4-PON0-ONT27                         ALARM1                                    5        2021-01-17 05:00:00     2021-01-20 08:00:00

案例4: Alarm2完全开始于Alarm1的末尾

在这里,我只需要得到一行,它包含Alarm1的开头和Alarm2的末尾

我试过使用下面的代码,但它没有帮助,我不知道如何解决这个问题。

代码语言:javascript
复制
df['newid']=(df['START']-df['END'].shift()).dt.total_seconds().gt(0).cumsum()

print(df.to_string(index=False))
grouped = df.groupby(df['ELEMENT'])
print(grouped.get_group('OLT5134-LT4-PON0-ONT29').to_string(index=False))
print(grouped.get_group('OLT5134-LT4-PON0-ONT25').to_string(index=False))

我是个新手,需要帮助。

代码语言:javascript
复制
                ELEMENT                                     TEXT         ID               START                 END  newid
 OLT5134-LT4-PON0-ONT29                            ACCES_DEGRADE          5 2021-01-19 18:07:22                 NaT  10975
 OLT5134-LT4-PON0-ONT29                            ACCES_DEGRADE          5 2021-01-19 19:07:42                 NaT  10975
 OLT5134-LT4-PON0-ONT29                            ACCES_DEGRADE          5 2021-01-19 19:52:57                 NaT  10975
 OLT5134-LT4-PON0-ONT29  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-19 19:52:57 2021-01-19 21:38:23  10975
 OLT5134-LT4-PON0-ONT29                            ACCES_DEGRADE          5 2021-01-19 23:09:02                 NaT  10976
 OLT5134-LT4-PON0-ONT29  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-19 23:09:02 2021-01-19 23:23:58  10976
 OLT5134-LT4-PON0-ONT29  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-20 08:50:54 2021-01-20 09:05:51  10977
 OLT5134-LT4-PON0-ONT29  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-20 10:06:18 2021-01-20 10:36:21  10978
 OLT5134-LT4-PON0-ONT29  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-20 11:17:29 2021-01-20 12:02:37  10979
                ELEMENT                                     TEXT         ID               START                 END  newid
 OLT5134-LT4-PON0-ONT25  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-19 20:59:06 2021-01-19 21:29:08  10971
 OLT5134-LT4-PON0-ONT25                            ACCES_DEGRADE          5 2021-01-19 23:24:53 2021-01-22 03:14:53  10972
 OLT5134-LT4-PON0-ONT25  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-20 06:47:20 2021-01-20 07:02:17  10972
 OLT5134-LT4-PON0-ONT25  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-20 09:48:19 2021-01-20 10:03:16  10973
 OLT5134-LT4-PON0-ONT25  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-20 10:18:29 2021-01-20 11:18:41  10974
 OLT5134-LT4-PON0-ONT25  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-20 11:33:55 2021-01-20 11:48:52  10975

当我有以下输入时:

代码语言:javascript
复制
                ELEMENT                                     TEXT         ID               START                 END
 OLT2227-LT3-PON0-ONT27  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-19 22:00:00 2021-01-19 22:30:00
 OLT2227-LT3-PON0-ONT27  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-19 22:30:00 2021-01-19 23:44:12
 OLT2227-LT3-PON0-ONT27  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-19 23:30:00 2021-01-19 23:46:12
 OLT2227-LT3-PON0-ONT30  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-19 19:30:00                 NaT
 OLT2227-LT3-PON0-ONT30  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-19 20:00:00 2021-01-19 23:00:00
 OLT2227-LT3-PON0-ONT31  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-19 18:00:00 2021-01-19 19:00:00
 OLT2227-LT3-PON0-ONT31  The signal degrade of ONTi (SDi) occurs  772874243 2021-01-19 20:30:00 2021-01-19 23:00:00

对于one 2227-LT3-PON0-ONT 31元素,由于没有重叠,它必须打印2行而不是1行。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-01-20 18:44:17

我知道可能有比下面更简单和最优的解决方案。但是,如果您有大量数据,则可以将group_byapply结合使用,这样执行起来可能会很慢。

代码语言:javascript
复制
import pandas as pd
from io import StringIO
import numpy as np

# Data preprocessing(ignore)
data = StringIO('''
   OLT5134-LT4-PON0-ONT25,                         ALARM1,                                    5,        2021-01-19 05:00:00,     NaT       
   OLT5134-LT4-PON0-ONT25,                         ALARM2,                                 772874243,   2021-01-20 06:00:00,     2021-01-20 08:00:00
   OLT5134-LT4-PON0-ONT25,                         ALARM3,                                 772874243,   2021-01-20 04:00:19,     2021-01-20 10:03:16
   OLT5134-LT4-PON0-ONT26,                         ALARM1,                                    5,        2021-01-19 05:00:00,     2021-01-20 10:00:00
   OLT5134-LT4-PON0-ONT26,                         ALARM2,                                 772874243,   2021-01-20 06:00:00,     2021-01-20 08:00:00
   OLT5134-LT4-PON0-ONT27,                         ALARM1,                                    5,        2021-01-17 05:00:00,     2021-01-19 10:00:00
   OLT5134-LT4-PON0-ONT27,                         ALARM2,                                 772874243,   2021-01-18 06:00:00,     2021-01-20 08:00:00
''')

df = pd.read_csv(data,names=['ELEMENT','TEXT','ID','START','END'])
df['START'] = pd.to_datetime(df['START'])
df['END'] = pd.to_datetime(df['END'],errors ='coerce')


#See from below
gb = df.groupby(by ='ELEMENT')

def group_filter(group):
  
  min = group['START'].min()
  max = group['END'].max()
  id = group.loc[ group['START'] == min]['ID'].iloc[0]
  text = group.loc[group['START'] == min]['TEXT'].iloc[0]
  result = None

  #Put your conditions here....
  if np.isnat(group['END']).any():
    max = pd.to_datetime('NAT',errors='coerce')
    result = pd.Series([id,text,min,max])
  
  result = pd.Series([id,text,min,max])
  return result
  
output = gb.apply(group_filter).reset_index()
output.columns = df.columns
output

输出:

代码语言:javascript
复制
ELEMENT TEXT    ID  START   END
0   OLT5134-LT4-PON0-ONT25  5   ALARM1  2021-01-19 05:00:00 NaT
1   OLT5134-LT4-PON0-ONT26  5   ALARM1  2021-01-19 05:00:00 2021-01-20 10:00:00
2   OLT5134-LT4-PON0-ONT27  5   ALARM1  2021-01-17 05:00:00 2021-01-20 08:00:00
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/65811941

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档