我将类别名称与技能名称结合起来,按类别名称对其进行排序。现在我有了列如下所示的表
(Category1) Skill 1
(Category1) Skill 2
(Category1) Skill 3
(Category1) Skill 4
(Category1) Skill 5
(Category1) Skill 6
(Category2) Skill 7
(Category2) Skill 8
(Category2) Skill 9
(Category2) Skill 10
(Category2) Skill 11
(Category2) Skill 12我只想为每个技能保留一个类别标题,并删除另一个,类似于有类似于此表的表。
(Category1) Skill 1
Skill 2
Skill 3
Skill 4
Skill 5
Skill 6
(Category2) Skill 7
Skill 8
Skill 9
Skill 10
Skill 11
Skill 12有什么想法吗?谢谢
发布于 2019-09-06 10:34:05
您可以拆分字符串并检索最后一部分Skill x,以及检查Categoryx被复制的位置,并使用结果替换为拆分的部分:
import numpy as np
m = df.col1.str.split(r'\) ', expand=True)
df['col1'] = np.where(m.duplicated(subset=0), m[1], df.col1)
col1
0 (Category1) Skill 1
1 Skill 2
2 Skill 3
3 Skill 4
4 Skill 5
5 Skill 6
6 (Category2) Skill 7
7 Skill 8
8 Skill 9
9 Skill 10
10 Skill 11
11 Skill 12输入数据-
col1
0 (Category1) Skill 1
1 (Category1) Skill 2
2 (Category1) Skill 3
3 (Category1) Skill 4
4 (Category1) Skill 5
5 (Category1) Skill 6
6 (Category2) Skill 7
7 (Category2) Skill 8
8 (Category2) Skill 9
9 (Category2) Skill 10
10 (Category2) Skill 11
11 (Category2) Skill 12发布于 2019-09-06 11:28:15
假设您的dataframe(df)列名为'A':
df2 = df.A.str.split(expand=True)
df2[0]=df2[0].mask(df2[0].eq(df2[0].shift())).fillna('')]
df.A = df2.apply(lambda x: ' '.join(x), axis=1)https://stackoverflow.com/questions/57820325
复制相似问题