library(sparklyr)
library(dplyr)
home <- "/usr/hdp/current/spark-client"
sc <- spark_connect(master = "yarn-client", spark_home = home, version = "1.6.2")
readFromSpark <- spark_read_csv(sc, name="test", path ="hdfs://hostname/user/test.csv",header=TRUE)我已经成功地使用sparklyr访问了hdfs。但是如何使用sparklyr访问hive表/命令,因为我需要将这个df存储到hive中。
发布于 2017-05-31 13:49:03
AFAIK,sparklyr没有直接创建数据库/表的功能。但是您可以使用DBI创建数据库/表。
library(DBI)
iris_preview <- dbExecute(sc, "CREATE EXTERNAL TABLE...")发布于 2018-04-26 02:20:38
我就是这样做到这一点的:
设置:
cc <- RxSpark(nameNode = hdfs_host(myADL))
rxSetComputeContext(cc)
myXDFname <- 'something'
hivTbl <- RxHiveData(table = myXDFname)
sc <- spark_connect('yarn-client')
tbl_cache(sc, myXDFname)
mytbl <- tbl(sc, myXDFname)现在用它做点什么
mytbl %>% head
mytbl %>%
filter(rlike(<txt col>, pattern)) %>%
group_by(something) %>%
tally() %>%
collect() %>% #this is important
ggplot(., aes(...)) %>%
geom_triforce(...)https://stackoverflow.com/questions/43273623
复制相似问题