我有一个关于谁在工作、在哪个任务中以及他/她什么时候开始工作的“日志”信息:
index | Entrance time | Name | Last name | Employee_ID | Task
--------------------------------------------------------------------
0 |2000-01-01 00:00:00 | John | Fischer | 001 | Maintenance
1 |2000-01-01 00:04:30 | John | Fischer | 001 | Development
2 |2000-01-01 00:04:30 | Bob | Conrad | 002 | Maintenance
3 |2000-01-01 00:10:00 | Mary | Smith | 003 | Multitasking
4 |2000-01-01 00:09:30 | John | Fischer | 001 | Maintenance
5 |2000-01-01 00:15:30 | John | Fischer | 001 | Maintenance
6 |2000-01-02 00:04:30 | Bob | Conrad | 002 | Maintenance
7 |2000-01-02 00:10:00 | Mary | Smith | 003 | Multitasking然后,我想消除重复,如果入口时间差之间的一个我们正在寻找和其余是低于10分钟,如果任务和名称是相同的。因此,由此产生的数据应该是:
index | Entrance time | Name | Last name | Employee_ID | Task
--------------------------------------------------------------------
0 |2000-01-01 00:00:00 | John | Fischer | 001 | Maintenance
1 |2000-01-01 00:04:30 | John | Fischer | 001 | Development
2 |2000-01-01 00:04:30 | Bob | Conrad | 002 | Maintenance
3 |2000-01-01 00:10:00 | Mary | Smith | 003 | Multitasking
5 |2000-01-01 00:15:30 | John | Fischer | 001 | Maintenance
6 |2000-01-02 00:04:30 | Bob | Conrad | 002 | Maintenance
7 |2000-01-02 00:10:00 | Mary | Smith | 003 | Multitasking我使用了drop_duplicates(subset="Name“、"Last”、"Task"),但我不知道如何应用时间条件来比较每一行和其余行。
希望你能帮我,谢谢
发布于 2020-10-22 16:23:59
对于时间差的计算,这可能会对你有所帮助。但是,您还需要根据重复情况应用您的条件。
# Make df sequential in ["Name", "Last name", "Task"]
df.sort_values(["Name", "Last name", "Task"], inplace=True)
# Compute time difference
temp = df['Entrance time'] - df['Entrance time'].shift()
# converts the difference in terms of minutes (taking into account absolute values)
df['diff_mins'] = temp.abs() /np.timedelta64(1,'m')产出:
2 2 2000-01-01 00:04:30 Bob Conrad 2 Maintenance nan
6 6 2000-01-02 00:04:30 Bob Conrad 2 Maintenance 1440
1 1 2000-01-01 00:04:30 John Fischer 1 Development 1440
0 0 2000-01-01 00:00:00 John Fischer 1 Maintenance 4.5
4 4 2000-01-01 00:09:30 John Fischer 1 Maintenance 9.5
5 5 2000-01-01 00:15:30 John Fischer 1 Maintenance 6
3 3 2000-01-01 00:10:00 Mary Smith 3 Multitasking 5.5
7 7 2000-01-02 00:10:00 Mary Smith 3 Multitasking 1440https://stackoverflow.com/questions/64465943
复制相似问题