如何逐组重新塑造数据帧,并将组中的下一步添加到新列中。
import pandas as pd
df = pd.DataFrame({
'id': ['A', 'A', 'A', 'B', 'B', 'B', 'C'],
'step': [1,2,3,1,3,4,1]
})
print(df)
id step
0 A 1
1 A 2
2 A 3
3 B 1
4 B 3
5 B 4
6 C 1
# target format
id current_step next_step
0 A 1 2
1 A 2 3
2 A 3 None
3 B 1 3
4 B 3 4
5 B 4 None
6 C 1 None这样做很好:
df.groupby(['id']).apply(
lambda df: df.assign(next_step = df['step'].shift(-1))
).reset_index(drop=True)发布于 2022-04-28 09:50:00
您可以将dataframe移回1,并将它们添加为新列,以填充下一步列,如下所示:
df[['id2', 'next_step']] = df[['id', 'step']].shift(-1)
id step id2 next_step
0 A 1 A 2.0
1 A 2 A 3.0
2 A 3 B 1.0
3 B 1 B 3.0
4 B 3 B 4.0
5 B 4 C 1.0
6 C 1 NaN NaN然后您可以检查id != id2的位置,并使用.loc将next_step中的值替换为none。
df.loc[df['id'] != df['id2'], 'next_step'] = None
id step id2 next_step
0 A 1 A 2.0
1 A 2 A 3.0
2 A 3 B NaN
3 B 1 B 3.0
4 B 3 B 4.0
5 B 4 C NaN
6 C 1 NaN NaN然后可以删除“id2”列,并将“步骤”重命名为“current_step”,如下所示:
df = df.drop('id2', axis=1)
df = df.rename(columns={'step':'current_step'})最后产出:
id current_step next_step
0 A 1 2.0
1 A 2 3.0
2 A 3 NaN
3 B 1 3.0
4 B 3 4.0
5 B 4 NaN
6 C 1 NaN发布于 2022-06-20 16:43:49
>>> import pandas as pd
>>> from datar.all import f, group_by, summarise, lead
>>>
>>> df = pd.DataFrame({
... 'id': ['A', 'A', 'A', 'B', 'B', 'B', 'C'],
... 'step': [1,2,3,1,3,4,1]
... })
>>>
>>> df >> group_by(f.id) >> summarise(current_step = f.step, next_step = lead(f.step))
[2022-06-20 09:43:04][datar][ INFO] `summarise()` has grouped output by ['id'] (override with `_
groups` argument)
id current_step next_step
<object> <int64> <float64>
0 A 1 2.0
1 A 2 3.0
2 A 3 NaN
3 B 1 3.0
4 B 3 4.0
5 B 4 NaN
6 C 1 NaN
[TibbleGrouped: id (n=3)]https://stackoverflow.com/questions/72041369
复制相似问题