问文本分析的SelectKBest
EN

Data Science用户

提问于 2016-05-26 20:39:25

回答 1查看 755关注 0票数 0

我有一堆数据

Class   Tweet
-1  toxic phenol ingredient in vaccines  the detrimental effects of injected phenol  have not been fully evaluated   
-1  doctors give flu shots to pregnant women despite evidence of harm to fetus  
-1   hearus autism and the mmr vaccine  the most diabolical medical scandal of the century       cdcwhistleblower  vaxxed
-1  rt   how to naturally detox from mandatory vaccine injections     hearus  cdcwhistleblower
-1  rt   the removal of vaccine exemptions forces parents who know that vaccines injure and sicken their children into coerced ha

我会在课文上做一些监督学习。我把案文通过了一条管道，如下：

selector = SelectKBest(chi2, k = K)

clf = SVC(kernel = 'linear')

p = Pipeline([('vect', CountVectorizer()),
                    ('tfidf',TfidfTransformer() ),
                    ('feat',selector),
                    ('clf',clf)])

SelectKBest将返回最佳特性的K。那么，我想这些将被用来训练模型。

我是否需要在相同的特性上预测新的推特呢？或者我可以离开管道，用简历找到最好的K？

python

nlp

scikit-learn

回答 1

Data Science用户

发布于 2016-05-26 21:00:32

您的管道现在是一个模型，期望类似的投入，如以前。你可以使用一个简历方案来选择你的K在你的管道内，在它已经评估了选项和选择正确的K，它将使用它来预测。对于实际预测，输入一个tweet列表，它将对其进行分类，不需要手动指定要使用的特性。如果您想知道管道选择了哪些功能，可以查看“p”对象的属性。

票数 1

页面原文内容由Data Science提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://datascience.stackexchange.com/questions/11942

复制

相似问题

问文本分析的SelectKBest
EN

回答 1

Data Science用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问文本分析的SelectKBestEN

回答 1

Data Science用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问文本分析的SelectKBest
EN