我有一个数据帧,其中包含不同的日进日出日志条目。但某些登录和注销值是在同一天发生的。例如df
+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+
| Id | LoginTime | LoginReading | LogoutTime | LogoutReading | Available | Calculated |
+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+
| 10036 | 2019-11-06 10:37:18.3743184 | 5054.68 | 2019-11-06 14:11:52.4833904 | 5057.94 | 500 | 530 |
| 10036 | 2019-11-11 12:46:20.7018683 | 4797.39 | 2019-11-11 15:55:48.9102874 | 4846.86 | 500 | 1091 |
| 10036 | 2019-11-11 15:56:42.7485310 | 4846.86 | 2019-11-11 17:20:40.2587713 | 4866.68 | 500 | 370 |
| 10036 | 2019-11-11 17:21:12.9416691 | 4866.68 | 2019-11-11 18:09:55.8326356 | 4892.33 | 500 | 311 |
| 10036 | 2019-11-14 08:08:00.6290260 | 5080.59 | 2019-11-14 18:43:31.8609822 | 5185.65 | 500 | 2528 |
| 10036 | 2019-11-15 06:00:48.4777280 | 5185.65 | 2019-11-15 06:31:55.0281168 | 5199.28 | 500 | 260 |
+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+我正在尝试将日期相似的行压缩/合并为一行。我的目标是实现这样的目标:
+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+
| Id | LoginTime | LoginReading | LogoutTime | LogoutReading | Available | Calculated |
+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+
| 10036 | 2019-11-06 10:37:18.3743184 | 5054.68 | 2019-11-06 14:11:52.4833904 | 5057.94 | 500 | 530 |
| 10036 | 2019-11-11 12:46:20.7018683 | 4797.39 | 2019-11-11 18:09:55.8326356 | 4892.33 | 500 | 1772 |
| 10036 | 2019-11-14 08:08:00.6290260 | 5080.59 | 2019-11-14 18:43:31.8609822 | 5185.65 | 500 | 2528 |
| 10036 | 2019-11-15 06:00:48.4777280 | 5185.65 | 2019-11-15 06:31:55.0281168 | 5199.28 | 500 | 260 |
+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+我在SO上尝试了一些其他类似的问题,但注销时间不起作用。
发布于 2019-11-25 13:26:29
您可以创建一个跟踪登录日期的新列,然后可以使用groupby()和agg()编译所需的值:
df['Day'] = df['LoginTime'].str.extract(r'^(\d{4}-\d{2}-\d{2})')
df.groupby('Day').agg({'Id': 'first', 'LoginTime': 'first', 'LoginReading': 'first', 'LogoutTime': 'last',
'LogoutReading': 'last', 'Available': 'first', 'Calculated': 'sum'}).reset_index(drop=True)收益率:
Id LoginTime LoginReading \
0 10036 2019-11-06 10:37:18.3743184 5054.68
1 10036 2019-11-11 12:46:20.7018683 4797.39
2 10036 2019-11-14 08:08:00.6290260 5080.59
3 10036 2019-11-15 06:00:48.4777280 5185.65
LogoutTime LogoutReading Available Calculated
0 2019-11-06 14:11:52.4833904 5057.94 500 530
1 2019-11-11 18:09:55.8326356 4892.33 500 1772
2 2019-11-14 18:43:31.8609822 5185.65 500 2528
3 2019-11-15 06:31:55.0281168 5199.28 500 260 https://stackoverflow.com/questions/59025606
复制相似问题