首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >SpaCy noun_chunks的pos_tag滤波

SpaCy noun_chunks的pos_tag滤波
EN

Stack Overflow用户
提问于 2020-05-12 16:45:08
回答 2查看 1.5K关注 0票数 1

正如子j行所述,我正在尝试根据noun_chunks的各个POS标记提取它们的元素。noun_chunk的元素似乎无法访问全局语句POS标记。

为证明这一问题:

代码语言:javascript
复制
[i.pos_ for i in nlp("Great coffee at a place with a great view!").noun_chunks]
>>> 
AttributeError: 'spacy.tokens.span.Span' object has no attribute 'pos_'

以下是我的低效解决方案:

代码语言:javascript
复制
def parse(text):
    doc = nlp(text.lower())
    tags = [(idx,i.text,i.pos_) for idx,i in enumerate(doc)]

    chunks = [i for i in doc.noun_chunks]

    indices = []
    for c in chunks:
        indices.extend(j for j in range(c.start_char,c.end_char))
    non_chunks = [w for w in ''.join([i for idx,i in enumerate(text) if idx not in indices]).split(' ') 
                  if w != '']

    chunk_words = [tup[1] for tup in tags if tup[1] not in non_chunks and tup[2] not in ['DET','VERB','SYM','NUM']] #these are the POS tags which I wanted to filter out from the beginning!

    new_chunks = []
    for c in chunks:
        new_words = [w for w in str(c).split(' ') if w in chunk_words]
        if len(new_words) > 1:
            new_chunk = ' '.join(new_words)
            new_chunks.append(new_chunk)
    return new_chunks

parse(
"""
I may be biased about Counter Coffee since I live in town, but this is a great place that makes a great cup of coffee. I have been coming here for about 2 years and wish I would have found it sooner. It is located right in the heart of Forest Park and there is a ton of street parking. The coffee here is great....many other words could describe it, but that sums it up perfectly. You can by coffee by the pound, order a hot drink, and they also have food. On the weekend, there are donuts brought in from Do-Rite Donuts which have almost a cult like following. The food is a little on the high end price wise, but totally worth it. I am a self admitted latte snob and they make an amazing latte here. You can add skim, whole, almond or oat milk and they will make it happen. I always order easy foam and they always make it perfectly. My girlfriend loves the Chai Latte with Oat Milk and I will admit it is pretty good. Give them a try.
""")

>>>
['counter coffee',
 'great place',
 'great cup',
 'forest park',
 'street parking',
 'many other words',
 'hot drink',
 'almost cult',
 'high end price',
 'latte snob',
 'amazing latte',
 'oat milk',
 'easy foam',
 'chai latte',
 'oat milk']

任何更快的解决方案都将受到欢迎!

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-05-13 08:42:13

这不管用:

代码语言:javascript
复制
[i.pos_ for i in nlp("Great coffee at a place with a great view!").noun_chunks]

因为noun_chunks返回Span对象,而不是Token对象。

您可以通过遍历这些标记来获得每个名词块中的POS标记:

代码语言:javascript
复制
nlp = spacy.load("en_core_web_md")
for i in nlp("Great coffee at a place with a great view!").noun_chunks:
    print(i, [t.pos_ for t in i])

这会给你

代码语言:javascript
复制
Great coffee ['ADJ', 'NOUN'] 
a place ['DET', 'NOUN'] 
a great view ['DET', 'ADJ', 'NOUN']
票数 4
EN

Stack Overflow用户

发布于 2020-05-13 02:22:52

此链接的原始信用:Phrase extraction

代码语言:javascript
复制
 def get_nns(doc):
        nns = []
        for token in doc:
            # Try this with other parts of speech for different subtrees.
            if token.pos_ == 'NOUN':
                pp = ' '.join([tok.orth_ for tok in token.subtree])
                nns.append(pp)
        return nns

 import spacy
    nlp = spacy.load('en_core_web_sm')
    ex = 'I am having a Great coffee at a place with a great view!'
    doc = nlp(ex)
    print(get_nns(doc))

输出:

代码语言:javascript
复制
['a Great coffee', 'a place with a great view', 'a great view']
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61757240

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档