我有一个这样的数据帧:
ColA ColB ColC
"lorem ipsum" ["lorem", "foo", "bar"]
"lorem ipsum" NaN
NaN ["lorem", "foo", "bar"]
NaN NaN我正在尝试获取以下输出:
ColA ColB ColC
"lorem ipsum" ["lorem", "foo", "bar"] "lorem"我试着这样使用理解列表:
df["C"] = [elem for elem in df["B"] if elem in df["A"] ]但是没有成功:
如果将ColB格式化为列表,则返回TypeError: unhashable type: 'list';如果使用元组,则返回ValueError: Length of values does not match length of index
如果能帮上忙我会很感激的,谢谢。
编辑+编辑2:在两列中只有一个单词(或无),我需要捕获它才能将它放在C列中。我还忘了提到,ColA和ColB的值可以是NaN。
发布于 2019-01-25 22:49:16
通过try+except使用自定义函数,并通过pipe传递DataFrame
df = pd.DataFrame({'A':['lorem ipsum','lorem ipsum',np.nan, np.nan],
'B':[["lorem", "foo", "bar"], np.nan, ["lorem", "foo", "bar"], np.nan]})
print (df)
A B
0 lorem ipsum [lorem, foo, bar]
1 lorem ipsum NaN
2 NaN [lorem, foo, bar]
3 NaN NaN
def test(df):
out = []
for a, b in zip(df["A"], df["B"]):
try:
out.append(next(y for y in b if y in a))
except Exception:
out.append('')
return out
df["C"] = df.pipe(test)
print (df)
A B C
0 lorem ipsum [lorem, foo, bar] lorem
1 lorem ipsum NaN
2 NaN [lorem, foo, bar]
3 NaN NaN 另一个效果不佳的解决方案是:
df = df.fillna("undefined")
df["C"] = [next((y for y in b if y in a), '') for a, b, in zip(df["A"],df["B"])]
print (df)
A B C
0 lorem ipsum [d, foo, bar]
1 lorem ipsum undefined u
2 undefined [lorem, foo, bar]
3 undefined undefined u发布于 2019-01-25 22:11:07
您可以定义自定义函数,然后使用map
# data adapted from @jezrael
df = pd.DataFrame({'A':['lorem ipsum', 'lorem ipsum', np.nan, np.nan, 'test string'],
'B':[["lorem", "foo", "bar"], np.nan, ["lorem", "foo", "bar"], np.nan, ["no", "match"]]})
def tester(val1, val2):
if (val1 != val1) or (val2 != val2):
return ''
return next((x for x in val2 if x in val1), '')
df['C'] = list(map(tester, df['A'], df['B']))默认参数''确保您在没有匹配项的地方有一个空字符串。我们还利用了np.nan != np.nan这一事实。
结果:
print(df)
A B C
0 lorem ipsum [lorem, foo, bar] lorem
1 lorem ipsum NaN
2 NaN [lorem, foo, bar]
3 NaN NaN
4 test string [no, match] 发布于 2019-01-25 22:47:31
在我用fillna替换了所有的NaN之后,前面的解决方案工作得很棒。
df = df.fillna("undefined")
df["C"] = [next((y for y in b if y in a), '') for a, b, in zip(df["A"],df["B"])]谢谢
https://stackoverflow.com/questions/54366320
复制相似问题