我正在做标签编码使用编码器使用以下代码。它进行编码,但从1开始,而不是0。我怎么才能从0开始编码?
label_tokenizer = Tokenizer()
label_tokenizer.fit_on_texts(labels)
training_label_seq = np.array(label_tokenizer.texts_to_sequences(train_labels))
validation_label_seq = np.array(label_tokenizer.texts_to_sequences(validation_labels))下面的代码显示它从1开始:
label_tokenizer.word_index
{'credit': 10,
'deduction': 9,
'notification': 6,
'notificationcredit': 4,
'notificationfailed': 8,
'notificationfinancial': 1,
'notificationimportant': 2,
'notificationreminder': 7,
'notificationsuccess': 11,
'otp': 3,
'personal': 5,
'promotion': 12,
'reminder': 13}目的是,我想使用这些标签来训练tensorflow。使用以1开头的标签编码,它会给出错误:Received a label value of 13 which is outside the valid range of [0, 13)
以下是模型定义。现在,为了让它工作,我在最后一层的所有类中添加了+1:
model = keras.Sequential([
keras.layers.Embedding(input_dim=max_words, output_dim=64, input_length=input_dim),
keras.layers.Bidirectional(keras.layers.LSTM(64)), #, return_sequences=True
keras.layers.Dense(y_train.shape[1]+1, activation="softmax")])发布于 2020-08-20 15:35:29
今天早上我遇到了这个问题,使用的变量表示法与您的完全相同...;D
Marco Cerliani说得对--所以,引用他的话,简单地说:
training_label_seq = np.array(label_tokenizer.texts_to_sequences(train_labels)) - 1
validation_label_seq = np.array(label_tokenizer.texts_to_sequences(validation_labels)) - 1实际上,您需要将类的数量保持在您想要的数量上-因此请确保删除+1
model = keras.Sequential([
keras.layers.Embedding(input_dim=max_words, output_dim=64, input_length=input_dim),
keras.layers.Bidirectional(keras.layers.LSTM(64)), #, return_sequences=True
keras.layers.Dense(y_train.shape[1], activation="softmax")])https://stackoverflow.com/questions/62883207
复制相似问题