熊猫:选择与字符串匹配的行,用该词创建新列,选择与字符串匹配的行,用该单词创建新列(找到)
list_provided=["mul","the","have", "then"]我的数据看起来如何
id text
a simultaneous there the
b simultaneous there
c mul why预期输出
id text found
1 simultaneous there the the
2 simultaneous there
3 mul why mul
4 have the have, the
5 then the late then,the发布于 2019-03-09 09:43:57
我认为像这样的事情应该有效:
df['text'].apply(lambda x: [i for i in x.split() if i in list_provided])发布于 2019-03-09 10:04:24
另一种使用regex模式的方法:
pat = r'\b' + r'\b|\b'.join(list_provided) + r'\b'
df['found'] = df.text.str.findall(pat)
id text found
0 a simultaneous there the [the]
1 b simultaneous there []
2 c mul why [mul]
3 d have the [have, the]
4 e then the late [then, the]https://stackoverflow.com/questions/55075968
复制相似问题