我有一个包含数据的数据框架:
superior
我想计算状态应用程序中每对创建时间之间的时间--计算。数据中没有列‘对’,为了更好地理解我把它们标为A,B,C的顺序。
pairs Transaction group Application number Created time Status
A 221110000363 HB202211100902000100 2022-11-10 09:05 application
A 221110000363 HB202211100902000100 2022-11-10 13:42 calculation
B 221110000363 HB202211100902000200 2022-11-10 14:02 application
B 221110000363 HB202211100902000200 2022-11-10 14:07 calculation
C 221110000363 HB202211100902000200 2022-11-10 14:43 application
C 221110000363 HB202211100902000200 2022-11-10 15:03 calculation我试图将其拆分到两个df上:一个包含事务组的所有应用程序- df1:
pairs Transaction group Application number start_time Status
A 221110000363 HB202211100902000100 2022-11-10 09:05 application
B 221110000363 HB202211100902000200 2022-11-10 14:02 application
C 221110000363 HB202211100902000200 2022-11-10 14:43 application另一种是所有的计算- df2:
pairs Transaction group Application number stop_time Status
A 221110000363 HB202211100902000100 2022-11-10 13:42 calculation
B 221110000363 HB202211100902000200 2022-11-10 14:07 calculation
C 221110000363 HB202211100902000200 2022-11-10 15:03 calculation在df1中创建时间,我将其重命名为‘statrt_time’,在df2中改名为‘stop_time’,但之后我不知道如何合并这两个数据帧,在一行中,我将有一对数据帧,包含两次--开始和停止。同样重要的是,它必须是“事务组”和“应用程序号”的组。
我在寻找这样的结果:
pairs Transaction group Application number start_time stop_time
A 221110000363 HB202211100902000100 2022-11-10 09:05 2022-11-10 13:42
B 221110000363 HB202211100902000200 2022-11-10 14:02 2022-11-10 14:07
C 221110000363 HB202211100902000200 2022-11-10 14:43 2022-11-10 15:03编辑:
逻辑对对:
在一个事务组中,我正在查找应用程序编号,为此,我正在寻找列“Status”中的第一个最低日期时间,其中值是‘application’,然后是第一个最低日期时间,其中‘Status’值是‘计算’。这是第一对。第二对是状态应用程序中的第二最低日期时间,然后是状态计算中的第二最低日期时间。这是第二对。
EDIT2
数据:
Application number Created time Transaction group Status
HB202211100902000100 2022-11-10 09:05 221110000363 application
HB202211100902000100 2022-11-10 13:42 221110000363 calculation
HB202211100902000100 2022-11-10 14:02 221110000363 application
HB202211100902000100 2022-11-10 14:07 221110000363 calculation
HB202211100902000100 2022-11-10 14:43 221110000363 application
HB202211100902000100 2022-11-10 15:03 221110000363 calculation
HB202211100902000100 2022-11-10 16:24 221110000363 application
HB202205152239000200 2022-05-15 22:53 220515000252 application
HB202205152239000200 2022-05-16 14:57 220515000252 calculation
HB202205152253000100 2022-05-15 22:56 220515000252 application
HB202205152253000100 2022-05-16 14:56 220515000252 calculation
HB202205152257000100 2022-05-15 22:57 220515000252 application
HB202205152257000100 2022-05-16 14:56 220515000252 calculation
HB202205152259000100 2022-05-15 23:00 220515000252 application
HB202205152259000100 2022-05-16 14:56 220515000252 calculation
HB202205152301000100 2022-05-15 23:02 220515000252 application
HB202205152301000100 2022-05-16 14:56 220515000252 calculation
HB202205152302000100 2022-05-15 23:04 220515000252 application
HB202205152302000100 2022-05-16 14:55 220515000252 calculation
HB202205152305000100 2022-05-15 23:07 220515000252 application
HB202205152305000100 2022-05-16 14:55 220515000252 calculation
HB202205152307000100 2022-05-15 23:09 220515000252 application
HB202205152307000100 2022-05-16 14:54 220515000252 calculation
HB202205152312000100 2022-05-15 23:13 220515000252 application
HB202205152312000100 2022-05-16 14:54 220515000252 calculation
HB202205152313000100 2022-05-15 23:17 220515000252 application
HB202205152313000100 2022-05-16 14:54 220515000252 calculation所以我用这个:
> df['pairs'] = df.groupby(['Status']).cumcount()我的数据看起来是:
Application number Created time Transaction group Status pairs
0 HB202211100902000100 2022-11-10 09:05:28 221110000363 application 0
1 HB202211100902000100 2022-11-10 13:42:44 221110000363 calculation 0
2 HB202211100902000100 2022-11-10 14:02:43 221110000363 application 1
3 HB202211100902000100 2022-11-10 14:07:16 221110000363 calculation 1
4 HB202211100902000100 2022-11-10 14:43:54 221110000363 application 2
5 HB202211100902000100 2022-11-10 15:03:06 221110000363 calculation 2
6 HB202211100902000100 2022-11-10 16:24:26 221110000363 application 3
7 HB202205152239000200 2022-05-15 22:53:18 220515000252 application 4
8 HB202205152239000200 2022-05-16 14:57:14 220515000252 calculation 3
9 HB202205152253000100 2022-05-15 22:56:11 220515000252 application 5
10 HB202205152253000100 2022-05-16 14:56:55 220515000252 calculation 4
11 HB202205152257000100 2022-05-15 22:57:58 220515000252 application 6
12 HB202205152257000100 2022-05-16 14:56:47 220515000252 calculation 5所以问题从第三对开始。
使用枢轴后:
df = (df.pivot(index=['pairs', 'Transaction group', 'Application number'], columns='Status', values='Created time'))数据显示:
Status application calculation
pairs Transaction group Application number
0 221110000363 HB202211100902000100 2022-11-10 09:05:28 2022-11-10 13:42:44
1 221110000363 HB202211100902000100 2022-11-10 14:02:43 2022-11-10 14:07:16
2 221110000363 HB202211100902000100 2022-11-10 14:43:54 2022-11-10 15:03:06
3 220515000252 HB202205152239000200 NaT 2022-05-16 14:57:14
221110000363 HB202211100902000100 2022-11-10 16:24:26 NaT
4 220515000252 HB202205152239000200 2022-05-15 22:53:18 NaT
HB202205152253000100 NaT 2022-05-16 14:56:55
5 220515000252 HB202205152253000100 2022-05-15 22:56:11 NaT
HB202205152257000100 NaT 2022-05-16 14:56:47所以,在我看来,最重要的是做一个好的配对。
发布于 2022-11-30 07:55:13
使用枢轴并重命名:
df['pairs'] = df.groupby('Status').cumcount()
(df.pivot(index=['pairs', 'Transaction group', 'Application number'],
columns='Status', values='Created time')
.rename(columns={'application': 'start_time', 'calculation': 'end_time'})
.reset_index()
)输出:
Status pairs Transaction group Application number start_time end_time
0 A 221110000363 HB202211100902000100 2022-11-10 09:05:00 2022-11-10 13:42:00
1 B 221110000363 HB202211100902000200 2022-11-10 14:02:00 2022-11-10 14:07:00
2 C 221110000363 HB202211100902000200 2022-11-10 14:43:00 2022-11-10 15:03:00https://stackoverflow.com/questions/74624381
复制相似问题