这是我的数据
Id Column_1 Column_2
1 United States United Tractor
2 Love of Fair Tales of Grim
3 Hotel Marriot Jakarta Marriot Hotel Jakarta 这是我的预期输出
Id Column_1 Column_2 Word
1 United States United Tractor united
2 Love of Fair Tales of Grim of
3 Hotel Marriot Jakarta Marriot Hotel Jakarta hotel marriot数据:
{'Id': [1, 2, 3],
'Column_1': ['United States', 'Love of Fair', 'Hotel Marriot Jakarta'],
'Column_2': ['United Tractor', 'Tales of Grim', 'Marriot Hotel Jakarta']}发布于 2022-04-12 06:05:40
一种选择是在列表理解中使用set.intersection:
df['Word'] = [' '.join(set(x.lower().split()) & set(y.lower().split())) for x,y in zip(df['Column_1'], df['Column_2'])]另一种选择是对列进行stack;然后在执行set.intersection的lambda上使用groupby.apply
df['Word'] = (df[['Column_1', 'Column_2']].stack().str.lower().str.split()
.groupby(level=0).apply(lambda x: ' '.join(set(x.iat[0]) & set(x.iat[1]))))输出:
Id Column_1 Column_2 Word
0 1 United States United Tractor united
1 2 Love of Fair Tales of Grim of
2 3 Hotel Marriot Jakarta Marriot Hotel Jakarta hotel marriot jakarta发布于 2022-04-12 07:00:31
非常相似但有点不同的解决方案:
df['Word'] = (df[['Column_1', 'Column_2']].
applymap(lambda x: set(x.lower().split())).
apply(lambda x: ' '.join(x.Column_1 & x.Column_2),1))
>>> df
'''
Id Column_1 Column_2 Word
0 1 United States United Tractor united
1 2 Love of Fair Tales of Grim of
2 3 Hotel Marriot Jakarta Marriot Hotel Jakarta hotel marriot jakartahttps://stackoverflow.com/questions/71837782
复制相似问题