我想从我在SparklyR中的模型中提取SparklyR。到目前为止,我有以下可重复的代码正在工作:
library(sparklyr)
library(dplyr)
sc <- spark_connect(method = "databricks")
dtrain <- data_frame(text = c("Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"Chinese Macao",
"Tokyo Japan Chinese"),
doc_id = 1:4,
class = c(1, 1, 1, 0))
dtrain_spark <- copy_to(sc, dtrain, overwrite = TRUE)
pipeline <- ml_pipeline(
ft_tokenizer(sc, input_col = "text", output_col = "tokens"),
ft_count_vectorizer(sc, input_col = 'tokens', output_col = 'myvocab'),
ml_decision_tree_classifier(sc, label_col = "class",
features_col = "myvocab",
prediction_col = "pcol",
probability_col = "prcol",
raw_prediction_col = "rpcol")
)
model <- ml_fit(pipeline, dtrain_spark)当我试图在下面运行ml_stage步骤时,我发现我不能提取feature_importances的向量,而是一个函数。前一篇文章(how to extract the feature importances in Sparklyr?)将其显示为我想要获得的向量。我在这里会犯什么错误?我还需要采取另一步来展开函数并在这里得到一个值的向量吗?
ml_stage(model, 3)$feature_importances下面是我对ml_stage的输出(而不是一个值的向量):
function (...)
{
tryCatch(.f(...), error = function(e) {
if (!quiet)
message("Error: ", e$message)
otherwise
}, interrupt = function(e) {
stop("Terminated by user", call. = FALSE)
})
}
<bytecode: 0x559a0d438278>
<environment: 0x559a0ce8e840>发布于 2022-03-23 19:38:40
我不确定这是否是您想要的,但是可以结合向量器模型和词汇表来提取模型的feature_importances,这将产生一个包含文本重要性的表。您可以使用以下代码:
library(sparklyr)
library(dplyr)
sc <- spark_connect(method = "databricks")
dtrain <- data_frame(text = c("Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"Chinese Macao",
"Tokyo Japan Chinese"),
doc_id = 1:4,
class = c(1, 1, 1, 0))
dtrain_spark <- copy_to(sc, dtrain, overwrite = TRUE)
pipeline <- ml_pipeline(
ft_tokenizer(sc, input_col = "text", output_col = "tokens"),
ft_count_vectorizer(sc, input_col = 'tokens', output_col = 'myvocab'),
ml_decision_tree_classifier(sc, label_col = "class",
features_col = "myvocab",
prediction_col = "pcol",
probability_col = "prcol",
raw_prediction_col = "rpcol")
)
model <- ml_fit(pipeline, dtrain_spark)
tibble(
token = unlist(ml_stage(model, 'count_vectorizer')$vocabulary),
importance = ml_stage(model, 'decision_tree_classifier')$feature_importances
)https://stackoverflow.com/questions/71504901
复制相似问题