问IndexError:在sklearn中使用NMF越界
EN

Stack Overflow用户

提问于 2017-01-14 05:51:12

回答 1查看 506关注 0票数 2

我正在尝试从一个数据语料库创建主题模型。代码能够正确地使用NMF从解析的数据中生成任务数量的主题，但是当语料库长度=20时，它就会中断，如下所示

20
[u'bell', u'closed', u'day', u'drinks', u'enjoy', u'food', u'good', u'great', u'll', u'new', u'nice', u'original', u'people', u'phoenix', u'place', u'rd', u'reopened', u'terrific', u'try', u'weekly']
Traceback (most recent call last):
  File "sklearnTfidf.py", line 238, in <module>
    trainTest()
  File "sklearnTfidf.py", line 185, in trainTest
    posDic += buildDictionary(pos_reviews)
  File "sklearnTfidf.py", line 143, in buildDictionary
    sortedDict = buildTFIDF(review)
  File "sklearnTfidf.py", line 110, in buildTFIDF
    nmf = NMF(n_components=no_topics, random_state=1, init='nndsvd').fit(tfidf)
  File "/opt/anaconda/lib/python2.7/site-packages/sklearn/decomposition/nmf.py", line 551, in fit
    self.fit_transform(X, **params)
  File "/opt/anaconda/lib/python2.7/site-packages/sklearn/decomposition/nmf.py", line 485, in fit_transform
    W, H = self._init(X)
  File "/opt/anaconda/lib/python2.7/site-packages/sklearn/decomposition/nmf.py", line 395, in _init
    W, H = _initialize_nmf(X, self.n_components_)
  File "/opt/anaconda/lib/python2.7/site-packages/sklearn/decomposition/nmf.py", line 116, in _initialize_nmf
    x, y = U[:, j], V[j, :]
IndexError: index 1 is out of bounds for axis 1 with size 1

我还在逐渐熟悉sklearn工具集，所以我接受这可能是我的一个简单的忽略，因为很多代码都是从不同的例子中收集在一起的。

# Create a dictionary of words from review
def buildDictionary(review) :
    buildTFIDF(review)
    #[unrelated code]


# Extract topic models from corpus
def buildTFIDF(corpus) :
    no_topics = 5
    no_features = 100
    no_top_words = 10
    tfidf_vectorizer = TfidfVectorizer(min_df=1, max_df=1.0, max_features=no_features, stop_words='english')
    tfidf = tfidf_vectorizer.fit_transform(corpus)
    tfidf_feature_names = tfidf_vectorizer.get_feature_names()

    print tfidf.getnnz()        # sanity checking
    print tfidf_feature_names   # sanity checking
    nmf = NMF(n_components=no_topics, random_state=1, init='nndsvd').fit(tfidf)

    display_topics(nmf, tfidf_feature_names, no_top_words)
    print ''


# Prints no_top_words for each feature
def display_topics(model, feature_names, no_top_words):
    for topic_idx, topic in enumerate(model.components_):
        print "Topic %d:" %(topic_idx)
        print " ".join([feature_names[i]
                for i in topic.argsort()[:-no_top_words - 1:-1]])

到底是什么导致了这个索引错误，我如何纠正它？

python

scikit-learn

nmf

回答 1

Stack Overflow用户

发布于 2017-10-18 22:16:25

我建议你看看这个答案：https://stackoverflow.com/a/43336816/8187340。问题出在decomposition.NMF(n_components)参数的值上。此参数必须等于或小于您的语料库数量。

示例:如果dtm.shape返回(6,6030)，则返回no_topics <= 6

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/41643992

复制

相似问题

问IndexError:在sklearn中使用NMF越界
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问IndexError:在sklearn中使用NMF越界EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问IndexError:在sklearn中使用NMF越界
EN