首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >带分类预测的sparklyr + lm

带分类预测的sparklyr + lm
EN

Stack Overflow用户
提问于 2021-06-10 06:58:37
回答 1查看 24关注 0票数 2

我正在尝试使用分类预测器在sparklyr上运行lm。在普通R中工作的示例在sparklyr中失败:

代码语言:javascript
复制
# this works
lm(Petal.Length ~ as.factor(Species), data = iris)

# this fails
spark_apply(
    iris_tbl,
    function(e) broom::tidy(lm(Petal_Length ~ as.factor(Species), e)),
    names = c("term", "estimate", "std.error", "statistic", "p.value"),
    # group_by = "Species"
    )

我试图模仿this example,只是简单地将自变量替换为分类变量。

错误日志:

代码语言:javascript
复制
...
21/06/09 22:48:01 ERROR sparklyr: RScript (3130) terminated unexpectedly: contrasts can be applied only to factors with 2 or more levels 
21/06/09 22:48:01 ERROR sparklyr: RScript (3130) collected callstack: 
16: stop("contrasts can be applied only to factors with 2 or more levels")
15: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])
14: model.matrix.default(mt, mf, contrasts)
13: model.matrix(mt, mf, contrasts)
12: lm(Petal_Length ~ as.factor(Species), e)
11: broom::tidy(lm(Petal_Length ~ as.factor(Species), e)) 
(21/06/09 22:48:01 INFO sparklyr: Session (3130) is shutting down with expected SocketException,java.net.SocketException: Socket closed)
21/06/09 22:48:01 ERROR sparklyr: Worker (3130) failed to complete R process
(21/06/09 22:48:01 ERROR sparklyr: Worker (3130) failed to run rscript: ,java.lang.Exception: sparklyr worker rscript failure with status 255, check worker logs for details.)
21/06/09 22:48:01 INFO sparklyr: Worker (3130) completed wait using lock for RScript
21/06/09 22:48:01 ERROR Executor: Exception in task 0.0 in stage 704.0 (TID 5010)
java.lang.Exception: sparklyr worker rscript failure with status 255, check worker logs for details.
    at sparklyr.Rscript.init(rscript.scala:83)
    at sparklyr.WorkerApply$$anon$2.run(workerapply.scala:125)
21/06/09 22:48:01 INFO sparklyr: Session (3130) is terminating backend
21/06/09 22:48:01 ERROR TaskSetManager: Task 0 in stage 704.0 failed 1 times; aborting job
...
EN

回答 1

Stack Overflow用户

发布于 2021-09-23 19:46:42

链接解决此问题的GitHub问题。

https://github.com/sparklyr/sparklyr/issues/3139

sparklyr团队提出的选项是两个选项之一。

可以将columns=c(Species="factor").

  • You参数设置为
  1. 可以设置用于推断模式的记录数

代码语言:javascript
复制
config <- spark_config()
config$`sparklyr.apply.schema.infer` <- 150
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/67912638

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档