我有一个dataframe (FinalDF),它看起来像这样
id | Movie | Cast
0 The Dark Knight Christopher Nolan
1 The Dark Knight Christian Bale
2 Pulp Fiction Quentin Tarantino
3 Pulp Fiction John Travolta
4 Schindler’s List Steven Spielberg
5 Schindler’s List Liam Neeson在movie_cast_DF中,电影名被映射到这样的ID
id | name | uuid
-------------------------
1 | The Dark Knight | m1
2 | Pulp Fiction | m2
3 | Schindler’s List | m3
4 | Christopher Nolan | d1
5 | Christian Bale | a1
6 | Quentin Tarantino | d2
7 | John Travolta | a2
8 | Steven Spielberg | d3
9 | Liam Neeson | a3我需要在FinalDF中映射如下列中的ids
id | Movie | Cast | mid | cid
------------------------------------------------------------------
0 The Dark Knight Christopher Nolan m1 d1
1 The Dark Knight Christian Bale m1 a1
2 Pulp Fiction Quentin Tarantino m2 d2
3 Pulp Fiction John Travolta m2 a2
4 Schindler’s List Steven Spielberg m3 d3
5 Schindler’s List Liam Neeson m3 a3我尝试使用以下方法:
def getID(x):
try:
return movie_cast_DF[movie_cast_DF['name'].str.contains(x.lower(), case=False)]['uuid'].values[0]
except:
return None
FinalDF['mid'] = FinalDF['Movie'].apply(getID)
FinalDF['cid'] = FinalDF['Cast'].apply(getID)
FinalDF.head()是否有任何有效和快速的方法来进行映射?
发布于 2018-01-10 21:15:28
首先,将name设置为df2的索引。
dfmap = df2.set_index("name").uuid
dfmap
name
The Dark Knight m1
Pulp Fiction m2
Schindler’s List m3
Christopher Nolan d1
Christian Bale a1
Quentin Tarantino d2
John Travolta a2
Steven Spielberg d3
Liam Neeson a3
Name: uuid, dtype: object我们将使用这个系列对象将键映射到df中的值。接下来,给map/replace打两次电话-
df['mid'] = df.Movie.map(dfmap)
df['cid'] = df.Cast.map(dfmap)
df
Movie Cast mid cid
id
0 The Dark Knight Christopher Nolan m1 d1
1 The Dark Knight Christian Bale m1 a1
2 Pulp Fiction Quentin Tarantino m2 d2
3 Pulp Fiction John Travolta m2 a2
4 Schindler’s List Steven Spielberg m3 d3https://stackoverflow.com/questions/48196115
复制相似问题