我有个数据,
Check In Date Check Out Date Number stage
2020/5/22 16:23 2020/5/22 18:39 1 a
2020/5/22 22:41 2020/5/23 2:03 1 b
2020/5/23 2:04 2020/5/23 2:04 1 c
2020/5/23 2:04 2020/5/23 2:56 1 d
2020/5/23 2:56 2020/5/23 2:56 2 a
2020/5/24 8:39 2020/5/24 8:39 2 b
2020/5/24 8:40 2020/5/24 10:58 2 c
2020/5/24 10:59 2020/5/24 10:59 2 d
df = pd.DataFrame({'Check In Date': ['2020/5/22 16:23', '2020/5/22 22:41', '2020/5/23 2:04', '2020/5/23 2:04', '2020/5/23 2:56', '2020/5/24 8:39', '2020/5/24 8:40', '2020/5/24 10:59'],
'Check Out Date': ['2020/5/22 18:39', '2020/5/23 2:03', '2020/5/23 2:04', '2020/5/23 2:56', '2020/5/23 2:56', '2020/5/24 8:39', '2020/5/24 10:58', '2020/5/24 10:59'],
'Number': [1, 1, 1, 1, 2, 2, 2, 2],
'stage': ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd']})我试着在数据中做一些如下的计算:
1 2
a -> b 4:02:00 5:43:00
b -> c 0:01:00 0:01:00
c -> d 0:00:00 0:01:00这等于
1 2
a -> b b: ckeck in date - a: check out date b: ckeck in date - a: check out date
b -> c c: ckeck in date - b: check out date c: ckeck in date - b: check out date
c -> d d: ckeck in date - c: check out date d: ckeck in date - c: check out date我检查了与熊猫和数据有关的例子,但我仍然不知道如何实现这一点。有什么想法吗?
发布于 2020-08-15 08:16:07
使用DataFrameGroupBy.shift来移动列stage和Check Out Date,用DataFrame.unstack进行整形,因此在最后一步中,DataFrame.sub可能会减去移位的列
df['Check In Date'] = pd.to_datetime(df['Check In Date'])
df['Check Out Date'] = pd.to_datetime(df['Check Out Date'])
g = df.groupby('Number')
df = (df.assign(shitfted = g['Check Out Date'].shift(),
stage = g['stage'].shift() + ' -> ' + df['stage'])
.set_index(['stage','Number'])[['Check In Date','shitfted']]
.unstack()
.dropna()
)
df = df['Check In Date'].sub(df['shitfted'])
print (df)
Number 1 2
stage
a -> b 04:02:00 1 days 05:43:00
b -> c 00:01:00 0 days 00:01:00
c -> d 00:00:00 0 days 00:01:00编辑:
对于所有组合,所有组合都使用交叉连接和筛选:
df['Check In Date'] = pd.to_datetime(df['Check In Date'])
df['Check Out Date'] = pd.to_datetime(df['Check Out Date'])
from itertools import combinations
c = [f'{a} -> {b}' for a, b in (combinations(df['stage'].unique(), 2))]
print (c)
['a -> b', 'a -> c', 'a -> d', 'b -> c', 'b -> d', 'c -> d']
df = (df.merge(df, on='Number')
.assign(stage = lambda x: x.pop('stage_x') + ' -> ' + x.pop('stage_y'))
.query('stage in @c')
# df = df[df['stage'].isin(c)]
.set_index(['stage','Number'])[['Check In Date_y','Check Out Date_x']]
.unstack())
df = df['Check In Date_y'].sub(df['Check Out Date_x'])
print (df)
Number 1 2
stage
a -> b 04:02:00 1 days 05:43:00
a -> c 07:25:00 1 days 05:44:00
a -> d 07:25:00 1 days 08:03:00
b -> c 00:01:00 0 days 00:01:00
b -> d 00:01:00 0 days 02:20:00
c -> d 00:00:00 0 days 00:01:00https://stackoverflow.com/questions/63423913
复制相似问题