我对熊猫数据帧有一个问题。
下面是我的数据框
ELEMENT TEXT ID START END newid
OLT2227-LT3-PON0-ONT03 USECASE1 - ALARM1 -NO OVERLAP 772874243 2021-01-19 18:00:00 2021-01-19 19:00:00 0
OLT2227-LT3-PON0-ONT03 USECASE1 - ALARM2 - NO OVERLAP 772874243 2021-01-19 19:10:00 2021-01-19 20:00:12 1
OLT2227-LT3-PON0-ONT05 USECASE2 - ALARM1 - Fully Contained 772874243 2021-01-19 18:00:00 2021-01-19 23:00:00 1
OLT2227-LT3-PON0-ONT05 USECASE2 - ALARM2 - Fully Contained 772874243 2021-01-19 19:00:00 2021-01-19 20:00:12 1
OLT2227-LT3-PON0-ONT10 USECASE3 - ALARM1 - START-END-RELATION 772874243 2021-01-19 22:00:00 2021-01-19 22:30:00 2
OLT2227-LT3-PON0-ONT10 USECASE3 - ALARM2 - START-END-RELATION 772874243 2021-01-19 22:30:00 2021-01-19 23:00:12 2
OLT2227-LT3-PON0-ONT21 USECASE3-ALARM1 772874243 2021-01-19 22:00:00 2021-01-19 22:10:00 2
OLT2227-LT3-PON0-ONT21 USECASE3-ALARM2-NO-END 772874243 2021-01-19 22:15:00 NaT 3
OLT2227-LT3-PON0-ONT4 USECASE-4 772874243 2021-01-19 17:30:00 NaT 3
OLT2227-LT3-PON0-ONT4 USECASE-4 772874243 2021-01-19 20:00:00 2021-01-19 23:00:00 3
OLT2227-LT3-PON0-ONT99 USECASE-5 772874243 2021-01-19 17:30:00 2021-01-19 22:00:00 3
OLT2227-LT3-PON0-ONT99 USECASE-5 772874243 2021-01-19 20:00:00 NaT 3我目前得到的输出是:
ELEMENT START END
OLT2227-LT3-PON0-ONT03 2021-01-19 18:00:00 2021-01-19 19:00:00
OLT2227-LT3-PON0-ONT03 2021-01-19 19:10:00 2021-01-19 20:00:12
OLT2227-LT3-PON0-ONT05 2021-01-19 18:00:00 2021-01-19 23:00:00
OLT2227-LT3-PON0-ONT10 2021-01-19 22:00:00 2021-01-19 23:00:12
OLT2227-LT3-PON0-ONT21 2021-01-19 22:00:00 2021-01-19 22:10:00
OLT2227-LT3-PON0-ONT21 2021-01-19 22:15:00 NaT
OLT2227-LT3-PON0-ONT4 2021-01-19 17:30:00 2021-01-19 23:00:00
OLT2227-LT3-PON0-ONT99 2021-01-19 17:30:00 2021-01-19 22:00:00它适用于除用例4和用例5之外的所有用例,时间范围有重叠。我需要结束时间是"NaT“,而不是以下内容:
OLT2227-LT3-PON0-ONT4 2021-01-19 17:30:00 2021-01-19 23:00:00
OLT2227-LT3-PON0-ONT99 2021-01-19 17:30:00 2021-01-19 22:00:00由于时间重叠,我希望它取范围的最小值和最大值(在本例中为NaT)。因此,预期的结果是:
OLT2227-LT3-PON0-ONT4 2021-01-19 17:30:00 NaT
OLT2227-LT3-PON0-ONT99 2021-01-19 17:30:00 NaT所有用例的最终预期最终结果是:
ELEMENT START END
OLT2227-LT3-PON0-ONT03 2021-01-19 18:00:00 2021-01-19 19:00:00
OLT2227-LT3-PON0-ONT03 2021-01-19 19:10:00 2021-01-19 20:00:12
OLT2227-LT3-PON0-ONT05 2021-01-19 18:00:00 2021-01-19 23:00:00
OLT2227-LT3-PON0-ONT10 2021-01-19 22:00:00 2021-01-19 23:00:12
OLT2227-LT3-PON0-ONT21 2021-01-19 22:00:00 2021-01-19 22:10:00
OLT2227-LT3-PON0-ONT21 2021-01-19 22:15:00 NaT
OLT2227-LT3-PON0-ONT4 2021-01-19 17:30:00 NaT
OLT2227-LT3-PON0-ONT99 2021-01-19 17:30:00 NaT以下是我使用的代码:
df['newid']=(df['START']-df['END'].shift()).dt.total_seconds().gt(0).cumsum()
print (df.to_string(index=False))
newdf=df.groupby(['newid','ELEMENT']).agg({'START':'min','END':'max'}).reset_index(level=1)
print (newdf.to_string(index=False))有人能帮我实现我的目标吗?谢谢
发布于 2021-01-24 08:34:54
我找到了答案。我将NaT转换为未来的日期,这是一种使逻辑工作的变通方法。
https://stackoverflow.com/questions/65819540
复制相似问题