文章/答案/技术大牛

发布

社区首页 >问答首页 >通过在字符串列中找到确切的单词来创建新列

问通过在字符串列中找到确切的单词来创建新列
EN

Stack Overflow用户

提问于 2018-04-11 08:05:24

回答 2查看 3.5K关注 0票数 3

如果列表中的任何单词与dataframe字符串列完全匹配，我希望创建一个带有1或0的新列。

list_provided=["mul","the"]
#how my dataframe looks
id  text
a    simultaneous there the
b    simultaneous there
c    mul why

预期输出

id  text                     found
a    simultaneous there the   1
b    simultaneous there       0
c    mul why                  1

第二行被分配为0，因为( "mul“或" the”在字符串列"text“中不完全匹配)

代码到现在为止尝试过

#For exact match I am using the below code
data["Found"]=np.where(data["text"].str.contains(r'(?:\s|^)penalidades(?:\s|$)'),1,0)

我如何在循环中迭代以找到所提供的单词列表中所有单词的精确匹配？

编辑:如果我按照Georgey的建议使用str.contains(模式)，所有数据行“找到”都变成1

data=pd.DataFrame({"id":("a","b","c","d"), "text":("simultaneous there the","simultaneous there","mul why","mul")})
list_of_word=["mul","the"]
pattern = '|'.join(list_of_word)
data["Found"]=np.where(data["text"].str.contains(pattern),1,0)

Output:
id  text                     found
a    simultaneous there the   1
b    simultaneous there       1
c    mul why                  1
d    mul                      1

“查找”列中的第二行应为0。

dataframe

python

string

python-3.x

pandas

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-04-11 08:25:05

您可以使用pd.Series.apply和sum使用生成器表达式来完成这一任务：

import pandas as pd

df = pd.DataFrame({'id': ['a', 'b', 'c'],
                   'text': ['simultaneous there the', 'simultaneous there', 'mul why']})

test_set = {'mul', 'the'}

df['found'] = df['text'].apply(lambda x: sum(i in test_set for i in x.split()))

#   id                    text  found
# 0  a  simultaneous there the      1
# 1  b      simultaneous there      0
# 2  c                 mul why      1

上面的内容提供了一个计数。如果您只需要一个布尔值，请使用any

df['found'] = df['text'].apply(lambda x: any(i in test_set for i in x.split()))

对于整数表示，链.astype(int)。

票数 5

Stack Overflow用户

发布于 2018-04-11 10:37:27

编辑1

试试这段代码

import pandas as pd
dataframe = [["simultaneous there the","simultaneous there","mul why","mul"],["a","b","c","d"]]
list_of_word = ["mul","the"]


dic = {
    "id": dataframe[1],
    "text": dataframe[0] 
}

DataF = pd.DataFrame(dic)

found = []
for key in DataF["text"]:
    anyvari = False
    for damn in key.split(" "):

        if(damn==list_of_word[0] or damn==list_of_word[1]):
            anyvari = True

            break
        else:
            continue
    if(anyvari!=True):
        found.append(0)
    else:
        found.append(1)


DataF["found"] = found         


print(DataF)

它会让你像这样

  id                    text  found
0  a  simultaneous there the      1
1  b      simultaneous there      0
2  c                 mul why      1
3  d                     mul      1

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/49769706

复制

相似问题

问通过在字符串列中找到确切的单词来创建新列
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问通过在字符串列中找到确切的单词来创建新列EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问通过在字符串列中找到确切的单词来创建新列
EN