我一整天都在做这件事,但没有运气。
我设法消除了TfidfVectorizer一行中的问题
这是我的工作代码
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
vectorizer.fit(xtrain)
X_train_count = vectorizer.transform(xtrain)
X_test_count = vectorizer.transform(xval)
X_train_count
from keras.models import Sequential
from keras import layers
input_dim = X_train_count.shape[1] # Number of features
model = Sequential()
model.add(layers.Dense(10, input_dim=input_dim, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.summary()
history = model.fit(X_train_count, ytrain,
epochs=10,
verbose=False,
validation_data=(X_test_count, yval),
batch_size=10)但当我改变到
from sklearn.feature_extraction.text import TfidfVectorizer
#TF-IDF initializer
vectorizer = TfidfVectorizer(max_df=0.8, max_features=1000)
vectorizer.fit(xtrain)
X_train_count = vectorizer.transform(xtrain)
X_test_count = vectorizer.transform(xval)
X_train_count
from keras.models import Sequential
from keras import layers
input_dim = X_train_count.shape[1] # Number of features
model = Sequential()
model.add(layers.Dense(10, input_dim=input_dim, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.summary()
history = model.fit(X_train_count, ytrain,
epochs=10,
verbose=False,
validation_data=(X_test_count, yval),
batch_size=10)唯一改变的是这两行
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(max_df=0.8, max_features=1000)然后我得到了这个错误
InvalidArgumentError: indices1 = 0,997出现故障。许多稀疏操作都需要排序索引。
使用tf.sparse.reorder创建一个正确排序的副本。
Op:SerializeManySparse
如何解决这个问题,以及为什么会发生这种情况?
发布于 2020-07-13 07:32:50
vectorizer.transform(...)生成一个稀疏数组,这对keras不好。您只需在一个简单的数组中转换它。这在以下几个方面是可能的:
vectorizer.transform(...).toarray()https://stackoverflow.com/questions/62871108
复制相似问题