首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >NLTK:情绪分析:结果一

NLTK:情绪分析:结果一
EN

Stack Overflow用户
提问于 2015-02-25 18:34:33
回答 1查看 457关注 0票数 2

很抱歉发布了这篇文章,因为答案可能在以下两种:NLTK sentiment analysis is only returning one value

或者这篇文章:Python NLTK not sentiment calculate correct

但我不知道如何把它应用到我的代码中。

我是Python和NLTK的一个大新手,我讨厌我不得不用一大段代码来打扰你,再次抱歉。

在我使用的代码中,结果总是得到'pos‘。我试着做分类,把积极的特征从训练集中去掉。那么回报总是“中性的”。

有人能告诉我我做错了什么吗?提前谢谢你!不要介意我所用的随机测试语句,它只是在我试图找出问题所在的时候出现的。

代码语言:javascript
复制
import re, math, collections, itertools
import nltk
import nltk.classify.util, nltk.metrics
from nltk.classify import NaiveBayesClassifier
from nltk.metrics import BigramAssocMeasures
from nltk.probability import FreqDist, ConditionalFreqDist  
from nltk.util import ngrams
from nltk.tokenize import word_tokenize
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.stem.porter import *
from nltk.stem.snowball import SnowballStemmer

stemmer = SnowballStemmer("english", ignore_stopwords = True)

pos_tweets = ['I love bananas','I like pears','I eat oranges']
neg_tweets = ['I hate lettuce','I do not like tomatoes','I hate apples']
neutral_tweets = ['I buy chicken','I am boiling eggs','I am chopping vegetables']

def uni(doc):
    x = []
    y = []
    for tweet in doc:
        x.append(word_tokenize(tweet))
    for element in x:
        for word in element:
            if len(word)>2:
                word = word.lower()
                word = stemmer.stem(word)
                y.append(word)
    return y

def word_feats_uni(doc):
     return dict([(word, True) for word in uni(doc)])

def tokenizer_ngrams(document):
    all_tokens = []
    filtered_tokens = []
    for (sentence) in document:
        all_tokens.append(word_tokenize(sentence))
    return all_tokens

def get_bi (document):
    x = tokenizer_ngrams(document)
    c = []
    for sentence in x:
        c.extend([bigram for bigram in nltk.bigrams(sentence)])
    return c

def get_tri(document):
    x = tokenizer_ngrams(document)
    c = []
    for sentence in x:
        c.extend([bigram for bigram in nltk.bigrams(sentence)])
    return c

def word_feats_bi(doc): 
    return dict([(word, True) for word in get_bi(doc)])

def word_feats_tri(doc):
    return dict([(word, True) for word in get_tri(doc)])

def word_feats_test(doc):
    feats_test = {}
    feats_test.update(word_feats_uni(doc))
    feats_test.update(word_feats_bi(doc))
    feats_test.update(word_feats_tri(doc))
    return feats_test

pos_feats = [(word_feats_uni(pos_tweets),'pos')] + [(word_feats_bi(pos_tweets),'pos')] + [(word_feats_tri(pos_tweets),'pos')]

neg_feats = [(word_feats_uni(neg_tweets),'neg')] + [(word_feats_bi(neg_tweets),'neg')] + [(word_feats_tri(neg_tweets),'neg')]

neutral_feats = [(word_feats_uni(neutral_tweets),'neutral')] + [(word_feats_bi(neutral_tweets),'neutral')] + [(word_feats_tri(neutral_tweets),'neutral')]

trainfeats = pos_feats + neg_feats + neutral_feats

classifier = NaiveBayesClassifier.train(trainfeats)

print (classifier.classify(word_feats_test('I am chopping vegetables and boiling eggs')))
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-02-25 18:52:56

解决办法很简单。您的word_feats_test将为句子'I am chopping vegetables and boiling eggs'返回一个空字典;因此,在没有特征的情况下,分类器偏向于pos

我把你的话写成一张单子:

代码语言:javascript
复制
print(classifier.classify(word_feats_test(
      ['I am chopping vegetables and boiling eggs'])))

neutral被打印出来。

您应该使用完全相同的函数来计算所有3种特性:训练集、测试集和分类。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/28726940

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档