我正试图用一个数据文件来处理重叠的时标,我发现很难继续下去。我有一个数据格式,它的值如下所示。我的数据中大约有40万行。有谁能帮我了解一下我能用什么功能来获得结果(对于3种用例)?
非常感谢您的帮助,提前。
我有4个用例:
1.例1:告警1没有EndTime,但是由于Alarm 1、2和3有重叠时间,所以我需要最低的启动时间和最大的开始时间(在这里,在这里,它没有结束)
|------ALARM1--------- (no end)
|----ALARM2-----|
|-------ALARM3--|
<--------------------------...
ELEMENT TEXT ID START END
OLT5134-LT4-PON0-ONT25 ALARM1 5 2021-01-19 05:00:00 NaT
OLT5134-LT4-PON0-ONT25 ALARM2 772874243 2021-01-20 06:00:00 2021-01-20 08:00:00
OLT5134-LT4-PON0-ONT25 ALARM3 772874243 2021-01-20 04:00:19 2021-01-20 10:03:16
Result
OLT5134-LT4-PON0-ONT25 ALARM3 5 2021-01-19 04:00:00 NaT2.案例2: Alarm2完全包含在Alarm1中
|-----ALARM1----|
|---ALARM2-|
<--------------->
ELEMENT TEXT ID START END
OLT5134-LT4-PON0-ONT26 ALARM1 5 2021-01-19 05:00:00 2021-01-20 10:00:00
OLT5134-LT4-PON0-ONT26 ALARM2 772874243 2021-01-20 06:00:00 2021-01-20 08:00:00
Result
OLT5134-LT4-PON0-ONT26 ALARM1 5 2021-01-19 05:00:00 2021-01-20 10:00:003.案例3: Alarm2和Alarm1有一个共同的重叠时间
|-----ALARM1----|
|---ALARM2-|
<--------------------->
ELEMENT TEXT ID START END
OLT5134-LT4-PON0-ONT27 ALARM1 5 2021-01-17 05:00:00 2021-01-19 10:00:00
OLT5134-LT4-PON0-ONT27 ALARM2 772874243 2021-01-18 06:00:00 2021-01-20 08:00:00
Result
OLT5134-LT4-PON0-ONT27 ALARM1 5 2021-01-17 05:00:00 2021-01-20 08:00:00案例4: Alarm2完全开始于Alarm1的末尾
在这里,我只需要得到一行,它包含Alarm1的开头和Alarm2的末尾
我试过使用下面的代码,但它没有帮助,我不知道如何解决这个问题。
df['newid']=(df['START']-df['END'].shift()).dt.total_seconds().gt(0).cumsum()
print(df.to_string(index=False))
grouped = df.groupby(df['ELEMENT'])
print(grouped.get_group('OLT5134-LT4-PON0-ONT29').to_string(index=False))
print(grouped.get_group('OLT5134-LT4-PON0-ONT25').to_string(index=False))我是个新手,需要帮助。
ELEMENT TEXT ID START END newid
OLT5134-LT4-PON0-ONT29 ACCES_DEGRADE 5 2021-01-19 18:07:22 NaT 10975
OLT5134-LT4-PON0-ONT29 ACCES_DEGRADE 5 2021-01-19 19:07:42 NaT 10975
OLT5134-LT4-PON0-ONT29 ACCES_DEGRADE 5 2021-01-19 19:52:57 NaT 10975
OLT5134-LT4-PON0-ONT29 The signal degrade of ONTi (SDi) occurs 772874243 2021-01-19 19:52:57 2021-01-19 21:38:23 10975
OLT5134-LT4-PON0-ONT29 ACCES_DEGRADE 5 2021-01-19 23:09:02 NaT 10976
OLT5134-LT4-PON0-ONT29 The signal degrade of ONTi (SDi) occurs 772874243 2021-01-19 23:09:02 2021-01-19 23:23:58 10976
OLT5134-LT4-PON0-ONT29 The signal degrade of ONTi (SDi) occurs 772874243 2021-01-20 08:50:54 2021-01-20 09:05:51 10977
OLT5134-LT4-PON0-ONT29 The signal degrade of ONTi (SDi) occurs 772874243 2021-01-20 10:06:18 2021-01-20 10:36:21 10978
OLT5134-LT4-PON0-ONT29 The signal degrade of ONTi (SDi) occurs 772874243 2021-01-20 11:17:29 2021-01-20 12:02:37 10979
ELEMENT TEXT ID START END newid
OLT5134-LT4-PON0-ONT25 The signal degrade of ONTi (SDi) occurs 772874243 2021-01-19 20:59:06 2021-01-19 21:29:08 10971
OLT5134-LT4-PON0-ONT25 ACCES_DEGRADE 5 2021-01-19 23:24:53 2021-01-22 03:14:53 10972
OLT5134-LT4-PON0-ONT25 The signal degrade of ONTi (SDi) occurs 772874243 2021-01-20 06:47:20 2021-01-20 07:02:17 10972
OLT5134-LT4-PON0-ONT25 The signal degrade of ONTi (SDi) occurs 772874243 2021-01-20 09:48:19 2021-01-20 10:03:16 10973
OLT5134-LT4-PON0-ONT25 The signal degrade of ONTi (SDi) occurs 772874243 2021-01-20 10:18:29 2021-01-20 11:18:41 10974
OLT5134-LT4-PON0-ONT25 The signal degrade of ONTi (SDi) occurs 772874243 2021-01-20 11:33:55 2021-01-20 11:48:52 10975当我有以下输入时:
ELEMENT TEXT ID START END
OLT2227-LT3-PON0-ONT27 The signal degrade of ONTi (SDi) occurs 772874243 2021-01-19 22:00:00 2021-01-19 22:30:00
OLT2227-LT3-PON0-ONT27 The signal degrade of ONTi (SDi) occurs 772874243 2021-01-19 22:30:00 2021-01-19 23:44:12
OLT2227-LT3-PON0-ONT27 The signal degrade of ONTi (SDi) occurs 772874243 2021-01-19 23:30:00 2021-01-19 23:46:12
OLT2227-LT3-PON0-ONT30 The signal degrade of ONTi (SDi) occurs 772874243 2021-01-19 19:30:00 NaT
OLT2227-LT3-PON0-ONT30 The signal degrade of ONTi (SDi) occurs 772874243 2021-01-19 20:00:00 2021-01-19 23:00:00
OLT2227-LT3-PON0-ONT31 The signal degrade of ONTi (SDi) occurs 772874243 2021-01-19 18:00:00 2021-01-19 19:00:00
OLT2227-LT3-PON0-ONT31 The signal degrade of ONTi (SDi) occurs 772874243 2021-01-19 20:30:00 2021-01-19 23:00:00对于one 2227-LT3-PON0-ONT 31元素,由于没有重叠,它必须打印2行而不是1行。
发布于 2021-01-20 18:44:17
我知道可能有比下面更简单和最优的解决方案。但是,如果您有大量数据,则可以将group_by与apply结合使用,这样执行起来可能会很慢。
import pandas as pd
from io import StringIO
import numpy as np
# Data preprocessing(ignore)
data = StringIO('''
OLT5134-LT4-PON0-ONT25, ALARM1, 5, 2021-01-19 05:00:00, NaT
OLT5134-LT4-PON0-ONT25, ALARM2, 772874243, 2021-01-20 06:00:00, 2021-01-20 08:00:00
OLT5134-LT4-PON0-ONT25, ALARM3, 772874243, 2021-01-20 04:00:19, 2021-01-20 10:03:16
OLT5134-LT4-PON0-ONT26, ALARM1, 5, 2021-01-19 05:00:00, 2021-01-20 10:00:00
OLT5134-LT4-PON0-ONT26, ALARM2, 772874243, 2021-01-20 06:00:00, 2021-01-20 08:00:00
OLT5134-LT4-PON0-ONT27, ALARM1, 5, 2021-01-17 05:00:00, 2021-01-19 10:00:00
OLT5134-LT4-PON0-ONT27, ALARM2, 772874243, 2021-01-18 06:00:00, 2021-01-20 08:00:00
''')
df = pd.read_csv(data,names=['ELEMENT','TEXT','ID','START','END'])
df['START'] = pd.to_datetime(df['START'])
df['END'] = pd.to_datetime(df['END'],errors ='coerce')
#See from below
gb = df.groupby(by ='ELEMENT')
def group_filter(group):
min = group['START'].min()
max = group['END'].max()
id = group.loc[ group['START'] == min]['ID'].iloc[0]
text = group.loc[group['START'] == min]['TEXT'].iloc[0]
result = None
#Put your conditions here....
if np.isnat(group['END']).any():
max = pd.to_datetime('NAT',errors='coerce')
result = pd.Series([id,text,min,max])
result = pd.Series([id,text,min,max])
return result
output = gb.apply(group_filter).reset_index()
output.columns = df.columns
output输出:
ELEMENT TEXT ID START END
0 OLT5134-LT4-PON0-ONT25 5 ALARM1 2021-01-19 05:00:00 NaT
1 OLT5134-LT4-PON0-ONT26 5 ALARM1 2021-01-19 05:00:00 2021-01-20 10:00:00
2 OLT5134-LT4-PON0-ONT27 5 ALARM1 2021-01-17 05:00:00 2021-01-20 08:00:00https://stackoverflow.com/questions/65811941
复制相似问题