首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >n_gram模型HashingVectorizer及其与keras的结合使用

n_gram模型HashingVectorizer及其与keras的结合使用
EN

Stack Overflow用户
提问于 2017-07-18 18:07:28
回答 1查看 432关注 0票数 0

我正在做核心外的学习。我必须创建一个大小为3的n_gram模型。对于这个purpose.Then,我已经使用了sklearn的HashingVectorizer,我必须使用keras来创建神经网络。但是,我不确定如何输入形状

代码语言:javascript
复制
vec = HashingVectorizer(decode_error = 'ignore', n_features = 2**20, ngram_range = (3,3))
X = vec.fit_transform(tags)
y = np_utils.to_categorical(tag)


print(X.shape)
print(y.shape)


model = Sequential()
model.add(Dense(1024, input_shape = (1,X.shape[1]), activation = 'softmax'))
model.add(Dense(y.shape[1], activation = 'softmax'))

model.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
model.fit(X, y, epochs = 10, batch_size = 200)

我的第二个编辑:这里几乎所有的东西都是一样的,但它抛出了错误。代码是:

代码语言:javascript
复制
from sklearn.feature_extraction.text import HashingVectorizer
from sklearn.neural_network import MLPClassifier
from sklearn.feature_extraction import FeatureHasher
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from keras.models import Sequential
from keras.layers.recurrent import LSTM, GRU, SimpleRNN
from keras.layers import Dense
from keras.utils import np_utils
from numpy import array
from scipy.sparse import csr_matrix

text = open('eng.train').read().lower().split()
X_train = []
y_train = []

for i in range (len(text)):
    if i % 4 == 0:
        X_train.append(text[i])
    if i % 4 == 1:
        y_train.append(text[i])

unq_tags = []
for i in range(len(y_train)):
    if y_train[i] not in unq_tags:
        unq_tags.append(y_train[i])
 #hashing X_train       
vec = HashingVectorizer(decode_error = 'ignore', n_features = 2**15)
X = vec.fit_transform(X_train)
X.toarray()
#one hot encoding y_train
values = array(y_train)
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(values)
encoded = np_utils.to_categorical(integer_encoded)

print(type(X))
print(X.shape)
print(type(encoded))
print(encoded.shape)

model = Sequential()
model.add(SimpleRNN(1024, input_shape = (X.shape[1],), activation = 'softmax'))
model.add(Dense(y.shape[1], activation = 'softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
model.fit(X, y, epochs = 20, batch_size = 200)

抛出的错误如下:

代码语言:javascript
复制
class 'scipy.sparse.csr.csr_matrix'>
(204567, 32768)
<class 'numpy.ndarray'>
(204567, 46)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-58-4ac701f1ade4> in <module>()
     47 '''
     48 model = Sequential()
---> 49 model.add(SimpleRNN(1024, input_shape = (X.shape[1],), activation = 'softmax'))
     50 model.add(Dense(y.shape[1], activation = 'softmax'))
     51 model.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

/home/manish/anaconda3/lib/python3.6/site-packages/keras/models.py in add(self, layer)
    420                 # and create the node connecting the current layer
    421                 # to the input layer we just created.
--> 422                 layer(x)
    423 
    424             if len(layer.inbound_nodes) != 1:

/home/manish/anaconda3/lib/python3.6/site-packages/keras/layers/recurrent.py in __call__(self, inputs, initial_state, **kwargs)
    250             else:
    251                 kwargs['initial_state'] = initial_state
--> 252         return super(Recurrent, self).__call__(inputs, **kwargs)
    253 
    254     def call(self, inputs, mask=None, initial_state=None, training=None

):

/home/manish/anaconda3/lib/python3.6/site-packages/keras/engine/topology.py in __call__(self, inputs, **kwargs)
    509                 # Raise exceptions in case the input is not compatible
    510                 # with the input_spec specified in the layer constructor.
--> 511                 self.assert_input_compatibility(inputs)
    512 
    513                 # Collect input shapes to build layer.

home/manish/anaconda3/lib/python3.6/site-packages/keras/engine/topology.py in assert_input_compatibility(self, inputs)
    411                                      self.name + ': expected ndim=' +
    412                                      str(spec.ndim) + ', found ndim=' +
--> 413                                      str(K.ndim(x)))
    414             if spec.max_ndim is not None:
    415                 ndim = K.ndim(x)

ValueError: Input 0 is incompatible with layer simple_rnn_8: expected ndim=3, found ndim=2

我只做了一些调整,但我得到了这个错误

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-07-18 21:01:49

指定input_shape = (1,X.shape[1])时,模型的输入应为尺寸[n_samples, 1, 1048576]。这是3个维度。但是你的实际数据只有2个维度。因此,您应该从input_shape中删除1

试试input_shape = (X.shape[1],)

请查看documentation以了解它。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/45163639

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档