文章/答案/技术大牛

发布

社区首页 >问答首页 >Python Maxent分类器

问Python Maxent分类器
EN

Stack Overflow用户

提问于 2013-04-13 11:59:59

回答 2查看 11.5K关注 0票数 2

我一直在python中使用maxent分类器及其失败，我不明白为什么。

我用的是电影评论。(总人数)

import nltk.classify.util
from nltk.classify import MaxentClassifier
from nltk.corpus import movie_reviews

def word_feats(words):
 return dict([(word, True) for word in words])

negids = movie_reviews.fileids('neg')
posids = movie_reviews.fileids('pos')

negfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'neg') for f in negids]
posfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'pos') for f in posids]

negcutoff = len(negfeats)*3/4
poscutoff = len(posfeats)*3/4

trainfeats = negfeats[:negcutoff] + posfeats[:poscutoff]
classifier = MaxentClassifier.train(trainfeats)

这是错误(我知道我做错了，请链接到Maxent的工作方式)

警告(来自警告模块)：文件"C:\Python27\lib\site-packages\nltk\classify\maxent.py"，第1334行sum1 = numpy.sum(exp_nf_delta * A，axis=0) RuntimeWarning:在乘法中遇到的无效值警告(来自警告模块)：文件"C:\Python27\lib\site-packages\nltk\classify\maxent.py"，第1335行sum2 = numpy.sum(nf_exp_nf_delta * A，axis=0) RuntimeWarning:在乘法中遇到的无效值警告(来自警告模块)：文件"C:\Python27\lib\site-packages\nltk\classify\maxent.py"，第1341行deltas -= (ffreq_empirical - sum1) / -sum2 RuntimeWarning:在divide中遇到的无效值

python

nltk

maxent

回答 2

Stack Overflow用户

回答已采纳

发布于 2013-04-13 22:59:43

numpy溢出问题可能有一个解决方法，但由于这只是一个用于学习NLTK /文本分类的电影评论分类器(而且您可能不希望培训太长时间)，我将提供一个简单的解决方法:您可以限制在功能集中使用的单词。

您可以在所有这样的评论中找到300最常用的单词(显然，如果您想要的话，您可以将其提高)，

all_words = nltk.FreqDist(word for word in movie_reviews.words())
top_words = set(all_words.keys()[:300])

然后，您所要做的就是在您的功能提取器中交叉引用top_words进行评论。另外，作为一种建议，使用字典理解比将list of tuple转换为dict更有效。所以这看起来像，

def word_feats(words):
    return {word:True for word in words if word in top_words}

票数 3

Stack Overflow用户

发布于 2014-02-21 14:06:47

我修改并更新了代码。

import nltk, nltk.classify.util, nltk.metrics
from nltk.classify import MaxentClassifier
from nltk.collocations import BigramCollocationFinder
from nltk.metrics import BigramAssocMeasures
from nltk.probability import FreqDist, ConditionalFreqDist
from sklearn import cross_validation


from nltk.classify import MaxentClassifier
from nltk.corpus import movie_reviews

def word_feats(words):
 return dict([(word, True) for word in words])

negids = movie_reviews.fileids('neg')
posids = movie_reviews.fileids('pos')

negfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'neg') for f in negids]
posfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'pos') for f in posids]

negcutoff = len(negfeats)*3/4
poscutoff = len(posfeats)*3/4

trainfeats = negfeats[:negcutoff] + posfeats[:poscutoff]
#classifier = nltk.MaxentClassifier.train(trainfeats)

algorithm = nltk.classify.MaxentClassifier.ALGORITHMS[0]
classifier = nltk.MaxentClassifier.train(trainfeats, algorithm,max_iter=3)

classifier.show_most_informative_features(10)

all_words = nltk.FreqDist(word for word in movie_reviews.words())
top_words = set(all_words.keys()[:300])

def word_feats(words):
    return {word:True for word in words if word in top_words}

票数 6

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/15987554

复制

相似问题

问Python Maxent分类器
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python Maxent分类器EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python Maxent分类器
EN