文章/答案/技术大牛

发布

社区首页 >问答首页 >AttributeError:找不到较低的功能；从Sklearn CountVectorizer中删除不常用的功能？

问AttributeError:找不到较低的功能；从Sklearn CountVectorizer中删除不常用的功能？
EN

Stack Overflow用户

提问于 2021-11-07 03:08:05

回答 1查看 70关注 0票数 0

制作语料库和词汇

K = 10
XYtr['description'] = XYtr['description'].fillna("nan")
Xte['description'] = Xte['description'].fillna("nan")
corpus = list(XYtr['description'])+list(Xte['description'])
vectorizer = CountVectorizer()
corpus = vectorizer.fit_transform(corpus)
lda = LatentDirichletAllocation(n_components = K)
lda.fit(corpus)
#There are no problems until here

# Create a list of (term, frequency) tuples sorted by their frequency
sum_words = corpus.sum(axis=0) 
words_freq = [(word, sum_words[0, idx]) for word, idx in vectorizer.vocabulary_.items()]
words_freq = sorted(words_freq, key = lambda x: x[1])

# Keep only the terms in a list
vocabulary, _ = zip(*words_freq[:int(total_features * 0.2)])
vocabulary = list(vocabulary)

#Finally, we use the vocabulary to limit the model to the less frequent terms.
bottom_vect = CountVectorizer(vocabulary=vocabulary)
topics = bottom_vect.fit_transform(corpus)

这在最后一行代码中返回"AttributeError: lower found“。因此，我无法获得“主题”。

如果能提出一些建议，我们将不胜感激。

以下是我的数据集的几行

XYtr：

Xte：

python

pandas

scikit-learn

回答 1

Stack Overflow用户

发布于 2021-11-07 18:52:50

你得到这个错误是因为你用CountVectorizer()的结果重写了corpus。举个例子：

corpus = ['This is the first document.',
'This document is the second document.',
'And this is the third one.',
'Is this the first document?']

将CountVectorizer()的结果分配给另一个object X：

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
lda = LatentDirichletAllocation(n_components = 2)
lda.fit(X)

sum_words = X.sum(axis=0) 
words_freq = [(word, sum_words[0, idx]) for word, idx in vectorizer.vocabulary_.items()]
words_freq = sorted(words_freq, key = lambda x: x[1])

total_features = len(words_freq)
vocabulary, _ = zip(*words_freq[:int(total_features * 0.2)])
vocabulary = list(vocabulary)

然后重新运行你的CountVectorizer：

bottom_vect = CountVectorizer(vocabulary=vocabulary)
topics = bottom_vect.fit_transform(corpus)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69869503

复制

相似问题

问AttributeError:找不到较低的功能；从Sklearn CountVectorizer中删除不常用的功能？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问AttributeError:找不到较低的功能；从Sklearn CountVectorizer中删除不常用的功能？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问AttributeError:找不到较低的功能；从Sklearn CountVectorizer中删除不常用的功能？
EN