我正在尝试使用分类预测器在sparklyr上运行lm。在普通R中工作的示例在sparklyr中失败:
# this works
lm(Petal.Length ~ as.factor(Species), data = iris)
# this fails
spark_apply(
iris_tbl,
function(e) broom::tidy(lm(Petal_Length ~ as.factor(Species), e)),
names = c("term", "estimate", "std.error", "statistic", "p.value"),
# group_by = "Species"
)我试图模仿this example,只是简单地将自变量替换为分类变量。
错误日志:
...
21/06/09 22:48:01 ERROR sparklyr: RScript (3130) terminated unexpectedly: contrasts can be applied only to factors with 2 or more levels
21/06/09 22:48:01 ERROR sparklyr: RScript (3130) collected callstack:
16: stop("contrasts can be applied only to factors with 2 or more levels")
15: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])
14: model.matrix.default(mt, mf, contrasts)
13: model.matrix(mt, mf, contrasts)
12: lm(Petal_Length ~ as.factor(Species), e)
11: broom::tidy(lm(Petal_Length ~ as.factor(Species), e))
(21/06/09 22:48:01 INFO sparklyr: Session (3130) is shutting down with expected SocketException,java.net.SocketException: Socket closed)
21/06/09 22:48:01 ERROR sparklyr: Worker (3130) failed to complete R process
(21/06/09 22:48:01 ERROR sparklyr: Worker (3130) failed to run rscript: ,java.lang.Exception: sparklyr worker rscript failure with status 255, check worker logs for details.)
21/06/09 22:48:01 INFO sparklyr: Worker (3130) completed wait using lock for RScript
21/06/09 22:48:01 ERROR Executor: Exception in task 0.0 in stage 704.0 (TID 5010)
java.lang.Exception: sparklyr worker rscript failure with status 255, check worker logs for details.
at sparklyr.Rscript.init(rscript.scala:83)
at sparklyr.WorkerApply$$anon$2.run(workerapply.scala:125)
21/06/09 22:48:01 INFO sparklyr: Session (3130) is terminating backend
21/06/09 22:48:01 ERROR TaskSetManager: Task 0 in stage 704.0 failed 1 times; aborting job
...发布于 2021-09-23 19:46:42
链接解决此问题的GitHub问题。
https://github.com/sparklyr/sparklyr/issues/3139
sparklyr团队提出的选项是两个选项之一。
可以将columns=c(Species="factor").
config <- spark_config()
config$`sparklyr.apply.schema.infer` <- 150https://stackoverflow.com/questions/67912638
复制相似问题