想象一下,我有一个数据文件,其中包含了一个候选人,以及他的书面和口头语言技能:
df = pd.DataFrame({'candidate': ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'd', 'd'],
'type': ['spoken', 'written', 'spoken', 'written', 'spoken', 'written', 'spoken', 'written', 'written', 'written'],
'language': ['English', 'German', 'French', 'English', 'English', 'English', 'French', 'English', 'German', 'French'],
'skill': [5, 4, 4, 6, 8, 1, 3, 5, 2, 2]})结果:
candidate type language skill
a spoken English 5
a written German 4
a spoken French 4
b written English 6
b spoken English 8
c written English 1
c spoken French 3
d written English 5
d written German 2
d written French 2另一个有语言的df:
languages = pd.DataFrame({'language': ['English', 'English', 'French', 'French', 'German', 'German'],
'type': ['spoken', 'written', 'spoken', 'written', 'spoken', 'written']})结果:
language type
0 English spoken
1 English written
2 French spoken
3 French written
4 German spoken
5 German written我需要得到的是一个数据with,它将df及其所有可能的合并与语言组合在一起,因此:
candidate type language skill
a spoken English 5
a written English NA
a spoken German NA
a written German 4
a spoken French 4
a written French NA
b spoken English 8
b written English 6
b spoken French NA
b written French NA
...
d spoken English NA
d written English 5
d spoken French NA
d written French 2
d spoken German NA
d written German 2诸若此类。我试图添加一个“有效”列,其中填充了“有效”值,并在这些数据文件中使用所有类型的合并,但它总是只返回df。有什么快速的方法来对付熊猫吗?
发布于 2022-09-18 01:53:36
尝试:
def fn(x):
x = x.merge(languages, how="outer")
x["candidate"] = x["candidate"].ffill().bfill()
return x
df = (
df.groupby("candidate")
.apply(fn)
.reset_index(drop=True)
.sort_values(["candidate", "language", "type"])
)
print(df)指纹:
candidate type language skill
0 a spoken English 5.0
3 a written English NaN
2 a spoken French 4.0
4 a written French NaN
5 a spoken German NaN
1 a written German 4.0
7 b spoken English 8.0
6 b written English 6.0
8 b spoken French NaN
9 b written French NaN
10 b spoken German NaN
11 b written German NaN
14 c spoken English NaN
12 c written English 1.0
13 c spoken French 3.0
15 c written French NaN
16 c spoken German NaN
17 c written German NaN
21 d spoken English NaN
18 d written English 5.0
22 d spoken French NaN
20 d written French 2.0
23 d spoken German NaN
19 d written German 2.0发布于 2022-09-18 02:39:54
# pip install pyjanitor
import janitor
import pandas as pd
df.complete('candidate', 'type', 'language')
candidate type language skill
0 a spoken English 5.0
1 a spoken German NaN
2 a spoken French 4.0
3 a written English NaN
4 a written German 4.0
5 a written French NaN
6 b spoken English 8.0
7 b spoken German NaN
8 b spoken French NaN
9 b written English 6.0
10 b written German NaN
11 b written French NaN
12 c spoken English NaN
13 c spoken German NaN
14 c spoken French 3.0
15 c written English 1.0
16 c written German NaN
17 c written French NaN
18 d spoken English NaN
19 d spoken German NaN
20 d spoken French NaN
21 d written English 5.0
22 d written German 2.0
23 d written French 2.0对于您的用例,它没有必要--您可以使用语言dataframe --将它作为字典传递:
languages = {'language': ['English', 'English', 'French',
'French', 'German', 'German'],
'type': ['spoken', 'written', 'spoken',
'written', 'spoken', 'written']}
df.complete('candidate', languages)
candidate type language skill
0 a spoken English 5.0
1 a written English NaN
2 a spoken French 4.0
3 a written French NaN
4 a spoken German NaN
5 a written German 4.0
6 b spoken English 8.0
7 b written English 6.0
8 b spoken French NaN
9 b written French NaN
10 b spoken German NaN
11 b written German NaN
12 c spoken English NaN
13 c written English 1.0
14 c spoken French 3.0
15 c written French NaN
16 c spoken German NaN
17 c written German NaN
18 d spoken English NaN
19 d written English 5.0
20 d spoken French NaN
21 d written French 2.0
22 d spoken German NaN
23 d written German 2.0如果您不热衷于导入另一个库,您可以在Pandas中以同样高效的方式做到这一点:
index = (pd.MultiIndex
.from_product(
[df.candidate.unique(),
df['type'].unique(),
df['language'].unique()],
names = ['candidate', 'type', 'language']
))
index = pd.DataFrame([], index = index)
index.merge(df, how = 'outer', on = index.index.names)
candidate type language skill
0 a spoken English 5.0
1 a spoken German NaN
2 a spoken French 4.0
3 a written English NaN
4 a written German 4.0
5 a written French NaN
6 b spoken English 8.0
7 b spoken German NaN
8 b spoken French NaN
9 b written English 6.0
10 b written German NaN
11 b written French NaN
12 c spoken English NaN
13 c spoken German NaN
14 c spoken French 3.0
15 c written English 1.0
16 c written German NaN
17 c written French NaN
18 d spoken English NaN
19 d spoken German NaN
20 d spoken French NaN
21 d written English 5.0
22 d written German 2.0
23 d written French 2.0https://stackoverflow.com/questions/73759551
复制相似问题