首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >ValueError:输入有n_features=12,而模型经过n_features=2494训练

ValueError:输入有n_features=12,而模型经过n_features=2494训练
EN

Stack Overflow用户
提问于 2021-09-10 09:10:23
回答 1查看 154关注 0票数 1

我用count_vectorizer,Tfidf_transformer和sgd分类器训练了一个模型。

这是记号器部分

代码语言:javascript
复制
from keras.preprocessing.text import Tokenizer
# The maximum number of words to be used. (most frequent)
MAX_NB_WORDS = 50000
# Max number of words in each complaint.
MAX_SEQUENCE_LENGTH = 250
# This is fixed.
EMBEDDING_DIM = 100
tokenizer = Tokenizer(num_words=MAX_NB_WORDS, filters='!"#$%&()*+,-./:;<=>?@[\]^_`{|}~', lower=True)
tokenizer.fit_on_texts(master_df['Observation'].values)
word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))

我训练了模特

代码语言:javascript
复制
from sklearn.linear_model import SGDClassifier
cv=CountVectorizer(max_df=1.0,min_df=1, stop_words=stop_words, max_features=10000, ngram_range=(1,3))
X=cv.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state = 42, stratify=y)
sgd = Pipeline([('tfidf', TfidfTransformer()),
                ('clf', SGDClassifier(loss='hinge', penalty='l2',alpha=1e-3, random_state=42, max_iter=5, tol=None)),
               ])
sgd.fit(X_train, y_train)


y_pred = sgd.predict(X_test)

print('accuracy %s' % accuracy_score(y_pred, y_test))
print(classification_report(y_test, y_pred,target_names=my_tags))

当我尝试使用这个模型来预测使用这个代码时,这个部分工作得很好。

代码语言:javascript
复制
sentence="Drill was not in operation in the mine at the time of visit."
test=preprocess_text(sentence)
test=test.lower()
print(test)
test=[test] 
tokenizer.fit_on_texts(test)
word_index = tokenizer.word_index
#print(word_index)
test1=cv.transform(test)
print(test1)
output=sgd.predict(test1)
output

它给了我这个错误。

代码语言:javascript
复制
ValueError: Input has n_features=12 while the model has been trained with n_features=2494
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_18044/596445027.py in <module>
      9 test1=cv.fit_transform(test)
     10 print(test1)
---> 11 output=sgd.predict(test1)
     12 output

~\AppData\Local\Programs\Python\Python38\lib\site-packages\sklearn\utils\metaestimators.py in <lambda>(*args, **kwargs)
    118 
    119         # lambda, but not partial, allows help() to work with update_wrapper
--> 120         out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
    121         # update the docstring of the returned function
    122         update_wrapper(out, self.fn)

~\AppData\Local\Programs\Python\Python38\lib\site-packages\sklearn\pipeline.py in predict(self, X, **predict_params)
    416         Xt = X
    417         for _, name, transform in self._iter(with_final=False):
--> 418             Xt = transform.transform(Xt)
    419         return self.steps[-1][-1].predict(Xt, **predict_params)
    420 

~\AppData\Local\Programs\Python\Python38\lib\site-packages\sklearn\feature_extraction\text.py in transform(self, X, copy)
   1491             expected_n_features = self._idf_diag.shape[0]
   1492             if n_features != expected_n_features:
-> 1493                 raise ValueError("Input has n_features=%d while the model"
   1494                                  " has been trained with n_features=%d" % (
   1495                                      n_features, expected_n_features))

ValueError: Input has n_features=12 while the model has been trained with n_features=2494

我认为问题在于word_index=tokenizer行,但我不知道如何纠正它。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-09-10 09:22:05

我们从不使用 fit_transform测试集;我们只使用transform。变到

代码语言:javascript
复制
test1=cv.transform(test)

类似地,您不应该用tokenizer.fit_on_texts(test)重新安装测试数据上的令牌程序;您应该将它更改为

代码语言:javascript
复制
tokenizer.texts_to_sequences(test)

有关文档和SO线程Keras Tokenizer方法到底是做什么的?的更多信息,请参见Tokenizer

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/69129913

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档