首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >协作主题建模的简单Python实现?

协作主题建模的简单Python实现?
EN

Stack Overflow用户
提问于 2015-08-25 23:40:50
回答 2查看 3.5K关注 0票数 32

我看到了这两篇文章,它们结合协作过滤(矩阵分解)和主题建模(LDA),根据用户感兴趣的文章/文章的主题术语,推荐用户类似的文章/帖子。

论文(PDF格式)是:"http://www.cs.columbia.edu/~blei/papers/WangBlei2011.pdf“和"http://www.cs.cmu.edu/afs/cs.cmu.edu/Web/People/norii/pub/github-ctr.pdf”。

新算法称为协同主题回归。我希望找到一些实现这个功能的python代码,但没有结果。这可能是一个很长的机会,但是有人能给出一个简单的python例子吗?

EN

回答 2

Stack Overflow用户

发布于 2016-10-12 20:42:11

这应该会让您开始工作(虽然还不确定为什么还没有发布):https://github.com/arongdari/python-topic-model

更具体地说:https://github.com/arongdari/python-topic-model/blob/master/ptm/collabotm.py

代码语言:javascript
复制
class CollaborativeTopicModel:
    """
    Wang, Chong, and David M. Blei. "Collaborative topic 
                                modeling for recommending scientific articles."
    Proceedings of the 17th ACM SIGKDD international conference on Knowledge
                                discovery and data mining. ACM, 2011.
    Attributes
    ----------
    n_item: int
        number of items
    n_user: int
        number of users
    R: ndarray, shape (n_user, n_item)
        user x item rating matrix
    """

看上去又好又直截了当。我仍然建议至少看看gensim。Radim在优化该软件方面做得非常出色。

票数 6
EN

Stack Overflow用户

发布于 2016-12-04 02:21:05

一个非常简单的使用gensin的LDA实现。您可以在这里找到更多的信息:https://radimrehurek.com/gensim/tutorial.html

我希望它能帮到你

代码语言:javascript
复制
from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer
from nltk.stem import RSLPStemmer
from gensim import corpora, models
import gensim

st = RSLPStemmer()
texts = []

doc1 = "Veganism is both the practice of abstaining from the use of animal products, particularly in diet, and an associated philosophy that rejects the commodity status of animals"
doc2 = "A follower of either the diet or the philosophy is known as a vegan."
doc3 = "Distinctions are sometimes made between several categories of veganism."
doc4 = "Dietary vegans refrain from ingesting animal products. This means avoiding not only meat but also egg and dairy products and other animal-derived foodstuffs."
doc5 = "Some dietary vegans choose to wear clothing that includes animal products (for example, leather or wool)." 

docs = [doc1, doc2, doc3, doc4, doc5]

for i in docs:

    tokens = word_tokenize(i.lower())
    stopped_tokens = [w for w in tokens if not w in stopwords.words('english')]
    stemmed_tokens = [st.stem(i) for i in stopped_tokens]
    texts.append(stemmed_tokens)

dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]

# generate LDA model using gensim  
ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics=2, id2word = dictionary, passes=20)
print(ldamodel.print_topics(num_topics=2, num_words=4))

(0,u‘0.066*动物+ 0.065*,+0.047*乘积+0.028*哲学’),(1,u'0.085*。+0.047*产品+0.028*膳食+0.028*蔬菜‘)

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/32215827

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档