我尝试使用tft.compute_and_apply_vocabulary和tft.tfidf在我的jupyter笔记本中计算tfidf。但是,我总是收到以下错误:
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'compute_and_apply_vocabulary/vocabulary/Placeholder' with dtype string
[[node compute_and_apply_vocabulary/vocabulary/Placeholder (defined at C:\Users\secsi\Anaconda3\envs\tf2\lib\site-packages\tensorflow_但是占位符类型实际上是字符串。
这是我的代码:
import tensorflow as tf
import tensorflow_transform as tft
with tf.Session() as sess:
documents = [
"a b c d e",
"f g h i j",
"k l m n o",
"p q r s t",
]
documents_tensor = tf.placeholder(tf.string)
tokens = tf.compat.v1.string_split(documents_tensor)
compute_vocab = tft.compute_and_apply_vocabulary(tokens, vocab_filename='vocab.txt')
global_vars_init = tf.global_variables_initializer()
tabel_init = tf.tables_initializer()
sess.run([global_vars_init, tabel_init])
token2ids = sess.run(tfidf, feed_dict={documents_tensor: documents})
print(f"token2ids: {token2ids}")版本:
提前感谢!
发布于 2019-08-21 05:42:25
我们不能像tft.compute_and_apply_vocabulary那样直接使用tft.compute_and_apply_vocabulary的操作,不像Tensorflow操作那样可以直接在Session中使用。
为了使用Tensorflow Transform的操作,我们必须在一个preprocessing_fn中运行它们,然后将其传递给tft_beam.AnalyzeAndTransformDataset。
在这种情况下,由于您有文本数据,您的代码可以按如下所示进行修改:
def preprocessing_fn(inputs):
"""inputs is our dataset"""
documents = inputs['documents']
tokens = tf.compat.v1.string_split(documents)
compute_vocab = tft.compute_and_apply_vocabulary(tokens)
# Add one for the oov bucket created by compute_and_apply_vocabulary.
review_bow_indices, review_weight = tft.tfidf(compute_vocab,
VOCAB_SIZE + 1)
return {
REVIEW_KEY: review_bow_indices,
REVIEW_WEIGHT_KEY: review_weight,
LABEL_KEY: inputs[LABEL_KEY]
}
(transformed_train_data, transformed_metadata), transform_fn =
((train_data, RAW_DATA_METADATA) | 'AnalyzeAndTransform' >>
tft_beam.AnalyzeAndTransformDataset(preprocessing_fn))有关如何在文本数据集上使用链接执行数据预处理的示例(情感分析),可以参考此Tensorflow Transform。
如果你觉得这个答案有用,请接受这个答案并/或向上投票。谢谢。
https://stackoverflow.com/questions/57447937
复制相似问题