文章/答案/技术大牛

发布

社区首页 >问答首页 >与Pandas DataFrame列准确匹配字典中的“键”&返回适当的值

问与Pandas DataFrame列准确匹配字典中的“键”&返回适当的值
EN

Stack Overflow用户

提问于 2018-03-06 19:57:15

回答 2查看 2.8K关注 0票数 0

我只想说，这个问题是从我之前的一个问题演变而来的，这个问题可以找到here。我有几次跟进，最终改变了原来的问题，所以我们在这里..

假设我们有以下数据：

d = {'keywords' :['cheapest cheap shoes', 'luxury shoes', 'cheap hiking shoes','liverpool']}
keywords = pd.DataFrame(d,columns=['keywords'])
In [7]: keywords
Out[7]:
    keywords
0  cheapest cheap shoes
1  luxury shoes
2  cheap hiking shoes
3  liverpool

然后创建一个字典，其中包含我希望与DataFrame中的值匹配的关键字。

labels = {'cheape' : 'budget', 'cheap' : 'budget', 'luxury' : 'expensive', 
'hiking' : 'sport', 'pool': 'pool'}

原来提供给我的答案帮助解决了字典中匹配键的问题。

d = {'keywords' :['cheapest cheap shoes', 'luxury shoes', 'cheap hiking 
shoes','liverpool']}

keywords = pd.DataFrame(d,columns=['keywords'])

labels = {'cheape' : 'budget', 'cheap' : 'budget', 'luxury' : 
'expensive','hiking' : 'sport', 'pool': 'pool'}

df = pd.DataFrame(d)

def matcher(k):
    x = (i for i in labels if i in k)
    return ' | '.join(map(labels.get, x))

df['values'] = df['keywords'].map(matcher)

                keywords    values
0   cheapest cheap shoes    budget | budget
1   luxury shoes            expensive
2   cheap hiking shoes      budget | sport
3   liverpool               pool

但是，我遇到了由部分匹配导致的匹配问题。在上面的输出中，请注意奇普将如何与“最便宜”相匹配，而游泳池将如何与“利物浦”匹配。

因此，我的问题是:是否有一种方法可以使我的字典与关键字中的值精确匹配，从而跳过部分匹配？

我想要的结果是：

                keywords    values
0   cheapest cheap shoes    budget
1   luxury shoes            expensive
2   cheap hiking shoes      budget | sport
3   liverpool               N/A

备注-字典将展开以包含与相同值相关联的键。这是为了捕捉任何拼写变化或拼写错误，例如{'car' : 'Automobile', 'cars' : 'Automobile', 'carss' : 'Automobile'}，这就是为什么我想要精确匹配，以防止任何重复/无关的值出现。

干杯

python

pandas

dictionary

textmatching

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-03-06 20:31:58

这是一个与我的第一个方案相一致的解决方案。str.split(' ')用空格拆分字符串。

import pandas as pd

d = {'keywords' :['cheapest cheap shoes', 'luxury shoes',
                  'cheap hiking shoes', 'liverpool']}

keywords = pd.DataFrame(d, columns=['keywords'])

labels = {'cheape': 'budget', 'cheap': 'budget', 'luxury': 'expensive',
          'hiking': 'sport', 'pool':'pool'}

df = pd.DataFrame(d)

def matcher(k):
    x = (i for i in labels if i in k.split(' '))
    return ' | '.join(map(labels.get, x))

df['values'] = df['keywords'].map(matcher)

结果

               keywords          values
0  cheapest cheap shoes          budget
1          luxury shoes       expensive
2    cheap hiking shoes  budget | sport
3             liverpool

票数 1

Stack Overflow用户

发布于 2018-03-06 20:19:29

试试这个：

df['values'] = (df['keywords']
                 .str.split(expand=True)
                 .apply(lambda x: x.map(labels).add(' | ').fillna(''))
                 .sum(axis=1)
                 .str.rstrip(' | ')
                 .replace('', 'N/A'))

结果：

In [60]: df
Out[60]:
               keywords          values
0  cheapest cheap shoes          budget
1          luxury shoes       expensive
2    cheap hiking shoes  budget | sport
3             liverpool             N/A

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/49138985

复制

相似问题

问与Pandas DataFrame列准确匹配字典中的“键”&返回适当的值
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问与Pandas DataFrame列准确匹配字典中的“键”&返回适当的值EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问与Pandas DataFrame列准确匹配字典中的“键”&返回适当的值
EN