我正在尝试用Keras训练一个复杂的神经网络来识别关于烹饪的堆栈交换问题的标签。
我的数据集的第一个问题元素如下:
id 2
title How should I cook bacon in an oven?
content <p>I've heard of people cooking bacon in an ov...
tags oven cooking-time bacon
Name: 1, dtype: object我用BeautifulSoup删除了标签,也去掉了标点符号。由于问题的内容很大,我决定把重点放在标题上。我已经使用sklearn来向量化标题中的单词。然而,他们超过8000字(不包括停止词)。所以我决定用词性的一部分来标记和检索名词和Gerunds。
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(stop_words='english')
titles = dataframes['cooking']['title']
pos_titles = []
for i,title in enumerate(titles):
pos = []
pt_titl = nltk.pos_tag(word_tokenize(title))
for pt in pt_titl:
if pt[1]=='NN' or pt[1]=='NNS' or pt[1]=='VBG':# or pt[1]=='VBP' or pt[1]=='VBS':
pos.append(pt[0])
pos_titles.append(" ".join(pos))这代表了我的输入向量。我也有矢量化标记,并为输入和标签提取密集矩阵。
tags = [" ".join(x) for x in dataframes['cooking']['tags']]
Xd = X.todense()
Y = vectorizer.fit_transform(tags)
Yd = Y.todense()将数据拆分为训练和验证集
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(Xd, Yd, test_size=0.33, random_state=42)现在我正在尝试训练一个Conv1D网络
from keras.models import Sequential
from keras.layers import Dense, Activation,Flatten
from keras.layers import Conv2D, MaxPooling2D,Conv1D, Embedding,GlobalMaxPooling1D,Dropout,MaxPooling1D
model = Sequential()
model.add(Embedding(Xd.shape[1],
128,
input_length=Xd.shape[1]))
model.add(Conv1D(32,5,activation='relu'))
model.add(MaxPooling1D(100,stride=50))
model.add(Conv1D(32,5,activation='relu'))
model.add(GlobalMaxPooling1D())
model.add(Dense(Yd.shape[1], activation ='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=32,verbose=1)但是它的精度很低,而且在历代几乎没有增加的损失。
Epoch 1/10
10320/10320 [==============================] - 401s - loss: 15.8098 - acc: 0.0604
Epoch 2/10
10320/10320 [==============================] - 339s - loss: 15.5671 - acc: 0.0577
Epoch 3/10
10320/10320 [==============================] - 314s - loss: 15.5509 - acc: 0.0578
Epoch 4/10
10320/10320 [==============================] - 34953s - loss: 15.5493 - acc: 0.0578
Epoch 5/10
10320/10320 [==============================] - 323s - loss: 15.5587 - acc: 0.0578
Epoch 6/10
6272/10320 [=================>............] - ETA: 133s - loss: 15.6005 - acc: 0.0550发布于 2019-01-09 19:19:38
这个通配符博客文章非常清楚地解释了如何在文本上使用一维卷积。x.ai的DS的Debo提供了一些示例Keras代码来使用基于字符的模型对文本进行分类(输入文档是由一个热编码字符组成的序列,而不是单词或POS标记):
from keras.models import Model
from keras.layers import Input, Dense, Dropout, Flatten
from keras.layers.convolutional import Convolution1D, MaxPooling1D
inputs = Input(shape=(maxlen, vocab_size), name='input', dtype='float32')
conv = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[0],
border_mode='valid', activation='relu',
input_shape=(maxlen, vocab_size))(inputs)
conv = MaxPooling1D(pool_length=3)(conv)
conv1 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[1],
border_mode='valid', activation='relu')(conv)
conv1 = MaxPooling1D(pool_length=3)(conv1)
conv2 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[2],
border_mode='valid', activation='relu')(conv1)
conv3 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[3],
border_mode='valid', activation='relu')(conv2)
conv4 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[4],
border_mode='valid', activation='relu')(conv3)
conv5 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[5],
border_mode='valid', activation='relu')(conv4)
conv5 = MaxPooling1D(pool_length=3)(conv5)
conv5 = Flatten()(conv5)
z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(conv5))
z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(z))
pred = Dense(n_out, activation='softmax', name='output')(z)
model = Model(input=inputs, output=pred)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',
metrics=['accuracy'])最后三行很重要。您不能对输出使用softmax,也不能使用'categorical_crossentropy'进行多标签标记(您的问题)。您的文本标记问题应该分解为多个二进制分类问题,或者您需要使用像'binary_crossentropy'这样的不同的丢失函数。对于binary_crossentropy,在输出中使用sigmoid激活函数而不是softmax。有关角标和TF中多标签标记问题的详细信息,请参阅此所以回答。
如果你想要一个更彻底的解释,请参阅我书“NLP in Action”中的第7章。
https://datascience.stackexchange.com/questions/17701
复制相似问题