我正在尝试为类创建一个假新闻分类模型,并且一直在尝试使用Keras来实现。
library(keras)
library(dplyr)
library(ggplot2)
library(purrr)
library(readr)
#loading data
df <- read_csv("train.csv")
test <- read_csv("test.csv")
df %>% count(label)
#splitting data
training_id <- sample.int(nrow(df), size = nrow(df)*0.8)
training <- df[training_id,]
testing <- df[-training_id,]
num_words <- 10000
max_length <- 50
text_vectorization <- layer_text_vectorization(
max_tokens = num_words,
output_mode = "tfidf"
)
#modeling
text_vectorization %>%
adapt(df$text)
input <- layer_input(shape = c(1), dtype = "string")
output <- input %>%
text_vectorization() %>%
layer_embedding(input_dim = num_words + 1, output_dim = 16) %>%
layer_global_average_pooling_1d() %>%
layer_dense(units = 16, activation = "relu") %>%
layer_dropout(0.5) %>%
layer_dense(units = 1, activation = "sigmoid")
model <- keras_model(input, output)
model %>% compile(
optimizer = 'rmsprop',
loss = 'binary_crossentropy',
metrics = list('accuracy')
)
history <- model %>% fit(
training$text,
as.numeric(training$label == "real"),
epochs = 30,
batch_size = 512,
validation_split = 0.2,
verbose=2
)
results <- model %>% evaluate(testing$text, as.numeric(testing$label == "real"), verbose = 0)
results
plot(history)问题具体出现在这一部分。
num_words <- 10000
max_length <- 50
text_vectorization <- layer_text_vectorization(
max_tokens = num_words,
output_mode = "tfidf"
)虽然它在输出模式"count“、"int”和"binary“下工作,但当我用tfidf运行它时,我得到了这个错误
Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: TextVectorization's output_mode arg received an invalid value tfidf. Allowed values are `None`, or one of the following values: ('int', 'count', 'binary', 'tf-idf'). 当我使用tf-idf运行它时,我得到这个错误
Error in match.arg(output_mode) :
'arg' should be one of “int”, “binary”, “count”, “tfidf”如果有人知道解决这个问题的办法,我将非常感谢。
发布于 2020-07-20 19:19:15
我已经将其确定为bug,并报告给github中的R keras团队。幸运的是,由于R是开源的,我已经管理了一个解决方案,并不完美,因为我仍然不能将它与交叉验证进行比较,但由于学习过程已经做到了这一点,所以它是不必要的。
我所要做的就是做一个跟踪trace("layer_text_vectorization", edit=TRUE)
和地点
if (output_mode=="tfidf")
output_mode <- "tf-idf"之后
output_mode <- match.arg(output_mode)https://stackoverflow.com/questions/62971602
复制相似问题