文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在下面的LDA模型中预测评论的主题？

问如何在下面的LDA模型中预测评论的主题？
EN

Stack Overflow用户

提问于 2016-12-22 04:40:12

回答 1查看 833关注 0票数 1

你好，我正在尝试制作几个小文本的主题模型，语料库是由一个社交网页上的评论组成的，我有以下的结构:首先，我列出了如下文件：

listComments = ["I like the post", "I hate to use this smartphoneee","iPhone 7 now has the best performance and battery life :)",...]


tfidf_vectorizer = TfidfVectorizer(min_df=10,ngram_range=(1,3),analyzer='word')
tfidf = tfidf_vectorizer.fit_transform(listComments)

我使用tfidf生成带有该参数的模型，然后使用LDA，如下所示：

#Using Latent Dirichlet Allocation
n_topics = 30
n_top_words = 20
lda = LatentDirichletAllocation(n_topics=n_topics,
                                learning_method='online',
                                learning_offset=50.,
                                random_state=0)

lda.fit(tfidf)
def print_top_words(model, feature_names, n_top_words):
    for topic_idx, topic in enumerate(model.components_):
        print("Topic #%d:" % topic_idx)
        print(" ".join([feature_names[i]
                        for i in topic.argsort()[:-n_top_words - 1:-1]]))
    print()
print("\nTopics in LDA model:")
tf_feature_names = tfidf_vectorizer.get_feature_names()
print_top_words(lda, tf_feature_names, n_top_words)
y_pred = lda.fit_transform(tfidf)

然后，我保存了tfidf和LDA两种模型，开发了下面的实验，并给出了一个新的注释，我用相同的模型将其矢量化。

comment = ['the car is blue']

x = tdf.transform(comment)

y = lda.transform(x)

print("this is the prediction",y)

我得到了：

this is the prediction [[ 0.03333333  0.03333333  0.03333333  0.03333333  0.03333333  0.03333333
   0.03333333  0.03333333  0.03333333  0.03333333  0.03333333  0.03333333
   0.03333333  0.03333333  0.03333333  0.03333333  0.59419197  0.03333333
   0.03333333  0.03333333  0.03333333  0.03333333  0.03333333  0.03333333
   0.03333333  0.03333333  0.03333333  0.86124492  0.03333333  0.03333333]]

我不排除这个向量，我有点不确定，但我不确定，但我相信它是由成为n_topics的一部分的概率组成的，我使用的是30，对于这个例子，我的新评论将有更多的概率属于这个主题的高成分，但这不是很直接，我的主要问题是，我是否需要构造一个方法，给出这个转换的较高部分的索引，来将一个向量进行分类，或者如果LDA有一些方法自动给出主题的数目，谢谢你的支持。

scikit-learn

lda

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-06-19 14:49:10

首先，您选择了一些等于n_topics (= 30)的主题。你得到的预测向量是一个(30，)形状的数组。每个组件表示注释属于第一个主题的概率.

请记住，LDA不是排他性的，文档可以属于多个类。例如，我可以说您的注释属于两个不同的类，概率分别为0.86和0.59

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/41275899

复制

相似问题

问如何在下面的LDA模型中预测评论的主题？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在下面的LDA模型中预测评论的主题？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在下面的LDA模型中预测评论的主题？
EN