我是新的ML和tensorflow,并试图培训和使用一个标准的文本生成模型。当我去训练模型时,我得到了这个错误
Train for 155 steps
Epoch 1/5
2/155 [..............................] - ETA: 4:49 - loss: 2.5786
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-133-d70c02ff4270> in <module>()
----> 1 model.fit(dataset, epochs=epochs, callbacks=[checkpoint_callback])
11 frames
/usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value)
InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: indices[58,87] = 63 is not in [0, 63)
[[node sequential_12/embedding_12/embedding_lookup (defined at <ipython-input-131-d70c02ff4270>:1) ]]
[[VariableShape/_24]]
(1) Invalid argument: indices[58,87] = 63 is not in [0, 63)
[[node sequential_12/embedding_12/embedding_lookup (defined at <ipython-input-131-d70c02ff4270>:1) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_distributed_function_95797]
Errors may have originated from an input operation.
Input Source operations connected to node sequential_12/embedding_12/embedding_lookup:
sequential_12/embedding_12/embedding_lookup/92192 (defined at /usr/lib/python3.6/contextlib.py:81)
Input Source operations connected to node sequential_12/embedding_12/embedding_lookup:
sequential_12/embedding_12/embedding_lookup/92192 (defined at /usr/lib/python3.6/contextlib.py:81)
Function call stack:
distributed_function -> distributed_function数据
data['title'] = [['Sentence'],['Sentence2'], ...]数据准备
tokenizer = keras.preprocessing.text.Tokenizer(num_words=209, lower=False, char_level=True)
tokenizer.fit_on_texts(df['title'])
df['encoded_with_keras'] = tokenizer.texts_to_sequences(df['title'])
dataset = df['encoded_with_keras']
dataset = tf.keras.preprocessing.sequence.pad_sequences(dataset, padding='post')
dataset = dataset.flatten()
dataset = tf.data.Dataset.from_tensor_slices(dataset)
sequences = dataset.batch(seq_len+1, drop_remainder=True)
def create_seq_targets(seq):
input_txt = seq[:-1]
target_txt = seq[1:]
return input_txt, target_txt
dataset = sequences.map(create_seq_targets)
batch_size = 128
buffer_size = 10000
dataset = dataset.shuffle(buffer_size).batch(batch_size, drop_remainder=True)模型:
vocab_size = len(tokenizer.word_index)
embed_dim = 128
rnn_neurons = 256
epochs = 5
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")
checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_prefix,
save_weights_only=True)
def create_model(vocab_size, embed_dim, rnn_neurons, batch_size):
model = Sequential()
model.add(Embedding(vocab_size, embed_dim, batch_input_shape=[batch_size, None], mask_zero=True))
model.add(LSTM(rnn_neurons, return_sequences=True, stateful=True))
model.add(Dropout(0.2))
model.add(LSTM(rnn_neurons, return_sequences=True, stateful=True))
model.add(Dropout(0.2))
model.compile(optimizer='adam', loss="sparse_categorical_crossentropy")
return model
model.fit(dataset, epochs=epochs, callbacks=[checkpoint_callback])我尝试过更改几乎所有的模型设置,并尝试使用自定义标记化和数据准备。但这开始训练,在155的第二步,我得到了这个错误。我不知道该从哪里开始帮忙。
发布于 2020-06-02 22:03:15
尝试将batch_size更改为32、16或8。显然,对于rtx 2060/70/80来说,有一个tensorflow bug使其内存不足。
发布于 2020-10-29 16:53:27
在类似的情况下,下面的片段起到了帮助作用。
import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)发布于 2021-11-09 19:24:44
我通过将validation_data添加到fit()函数中解决了这个问题
model.fit(X,validation_data = y)
https://stackoverflow.com/questions/60082554
复制相似问题