我有一堆数据
Class Tweet
-1 toxic phenol ingredient in vaccines the detrimental effects of injected phenol have not been fully evaluated
-1 doctors give flu shots to pregnant women despite evidence of harm to fetus
-1 hearus autism and the mmr vaccine the most diabolical medical scandal of the century cdcwhistleblower vaxxed
-1 rt how to naturally detox from mandatory vaccine injections hearus cdcwhistleblower
-1 rt the removal of vaccine exemptions forces parents who know that vaccines injure and sicken their children into coerced ha 我会在课文上做一些监督学习。我把案文通过了一条管道,如下:
selector = SelectKBest(chi2, k = K)
clf = SVC(kernel = 'linear')
p = Pipeline([('vect', CountVectorizer()),
('tfidf',TfidfTransformer() ),
('feat',selector),
('clf',clf)])SelectKBest将返回最佳特性的K。那么,我想这些将被用来训练模型。
我是否需要在相同的特性上预测新的推特呢?或者我可以离开管道,用简历找到最好的K?
发布于 2016-05-26 21:00:32
您的管道现在是一个模型,期望类似的投入,如以前。你可以使用一个简历方案来选择你的K在你的管道内,在它已经评估了选项和选择正确的K,它将使用它来预测。对于实际预测,输入一个tweet列表,它将对其进行分类,不需要手动指定要使用的特性。如果您想知道管道选择了哪些功能,可以查看“p”对象的属性。
https://datascience.stackexchange.com/questions/11942
复制相似问题