tokenizer = Tokenizer()
tokenizer.fit_on_texts(X_train)
encoded_docs = tokenizer.texts_to_sequences(X_train)
padded_sequence = pad_sequences(encoded_docs, maxlen=60)
test_tweets = tokenizer.texts_to_sequences(X_test)
test_padded_sequence = pad_sequences(test_tweets, maxlen=60)尽管我没有提供oov_token参数,但代码没有出现任何错误。我希望在test_tweets = tokenizer.texts_to_sequences(X_test)中得到一个错误
当您不提供oov_token时,tensorflow如何处理测试期间词汇量不足的单词?
发布于 2022-03-30 14:53:11
如果oov_token是None,默认情况下OOV单词将被忽略/丢弃
import tensorflow as tf
tokenizer = tf.keras.preprocessing.text.Tokenizer()
tokenizer.fit_on_texts(['hello world'])
print(tokenizer.word_index)
sequences = tokenizer.texts_to_sequences(['hello friends'])
print(sequences){'hello': 1, 'world': 2}
[[1]]https://stackoverflow.com/questions/71679308
复制相似问题