首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >无法从RStudio连接到sparklyr

无法从RStudio连接到sparklyr
EN

Stack Overflow用户
提问于 2018-11-13 04:13:11
回答 1查看 626关注 0票数 2

我正在尝试从RStudio连接到spark。目前我们使用的是Cloudera Hadoop发行版,其中运行的是Spark (2.2)。我测试了从边缘节点开始的所有东西,我能够创建Spark上下文并执行我的查询。从RStudio到昨天一切都很好,突然我们遇到了RStudio的问题。

代码语言:javascript
复制
library(dplyr)    
library(sparklyr)   
config <- spark_config()    
config$spark.driver.memory <- "8G"    
config$spark.executor.memory <- "8G"    
config$spark.executor.executor <- "2"    
config$spark.executor.cores <- "4"    
config$spark.kryoserializer.buffer.max <- "2000m"    
config$spark.driver.maxResultSize <- "4G"    
config$spark.akka.frameSize <- "768"    
sc <- spark_connect(master="yarn-client",     
version="2.2.0",    
 config=config,     
spark_home = '/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354/lib/spark2')
代码语言:javascript
复制
Error in force(code) : 
  Failed while connecting to sparklyr to port (8880) for sessionid (14727): Sparklyr gateway did not respond while retrieving ports information after 60 seconds
    Path: /opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354/lib/spark2/bin/spark-submit
    Parameters: --class, sparklyr.Shell, '/usr/lib64/R/library/sparklyr/java/sparklyr-2.2-2.11.jar', 8880, 14727
    Log: /tmp/RtmpoNJQEH/file151b437c0313b_spark.log


---- Output Log ----
18/11/12 13:54:50 INFO sparklyr: Session (14727) is starting under 127.0.0.1 port 8880
18/11/12 13:54:50 INFO sparklyr: Session (14727) found port 8880 is not available
18/11/12 13:54:50 INFO sparklyr: Backend (14727) found port 8884 is available
18/11/12 13:54:50 INFO sparklyr: Backend (14727) is registering session in gateway
18/11/12 13:54:50 INFO sparklyr: Backend (14727) is waiting for registration in gateway

---- Error Log ----

我也验证了sparklyr的版本,是0.9.2

能告诉我可能是哪里出了问题吗?

EN

回答 1

Stack Overflow用户

发布于 2019-01-03 22:28:21

你能试一下吗?

代码语言:javascript
复制
library(httr)
library(sparklyr)
Sys.setenv(SPARK_HOME = '/opt/cloudera/parcels/SPARK2/lib/spark2')
Sys.setenv(YARN_CONF_DIR = '/opt/cloudera/parcels/SPARK2/lib/spark2/conf/yarn-conf/')
config <- list()
config=c(config,list("sparklyr.shell.deploy-mode"="client"))
httr::with_config(
  config = httr::authenticate(user=":", password="", type="gssnegotiate"),
sc <- spark_connect(master = "yarn-client", version = "2.2.0", config = config))
sc

如果启用了SSL和Kerberos,则可能需要使用此选项

代码语言:javascript
复制
library(httr)
library(sparklyr)
set_config(config(cainfo = "/opt/cloudera/security/global_cacerts.pem"))
Sys.setenv(SPARK_HOME = '/opt/cloudera/parcels/SPARK2/lib/spark2')
Sys.setenv(YARN_CONF_DIR = '/opt/cloudera/parcels/SPARK2/lib/spark2/conf/yarn-conf/')
config <- list()
config=c(config,list("sparklyr.shell.keytab"="/PATH/PATH.keytab",
"sparklyr.shell.principal"="user@DOMAIN.COM",
"sparklyr.shell.deploy-mode"="client"
))
httr::with_config(
  config = httr::authenticate(user=":", password="", type="gssnegotiate"),
sc <- spark_connect(master = "yarn-client", version = "2.2.0", config = config))
sc

注意:使用pem格式的根CA路径替换cainfo,在sparklyr.shell.keytab中指定用户密钥表,在sparklyr.shell.principal中指定UPN(用户主体名称

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/53269419

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档