我正在尝试将rdd的输出保存到elasticsearch中。但当我尝试发送它时,即使在包含了几个elasticsearch-spark库之后,我也会遇到一个错误。我是一个新的弹性搜索,任何帮助都会非常感谢。谢谢。
import org.apache.spark.{SparkConf, SparkContext}
import org.elasticsearch.spark._
object ElasticSpark {
def main(args: Array[String]) {
val logfile = "/Users/folder/Desktop/logfile.rtf";
val conf = new SparkConf().setMaster("local[1]").setAppName("RddTest"); // set master can be given any cpu cores as local[*], spark clustr, mesos,
conf.set("es.index.auto.create", "true")
val sc = new SparkContext(conf);
val logdata = sc.textFile(logfile); // number of partitions
val NumA = logdata.filter(line=>line.contains("a")).count();
val wordcount = logdata.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey((a, b)=> a+ b);
println(wordcount.collect()); // doubt
wordcount.saveAsTextFile("/Users/folder/Desktop/sample") // success
wordcount.saveToEs("spark/docs")
}
}错误
Error:(21, 15) value saveToEs is not a member of org.apache.spark.rdd.RDD[(String, Int)]
wordcount.saveToEs("spark/docs")
^
Error:(6, 12) object elasticsearch is not a member of package org
import org.elasticsearch.spark._
^发布于 2016-04-24 05:18:06
ES支持不是Spark发行版的一部分,它是elasticsearch-hadoop的一部分,所以你需要提供这种依赖性。如果你使用Maven,在你的pom.xml中添加:
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-hadoop</artifactId>
<version>2.2.0</version>
</dependency>对于sbt,添加到build.sbt:
libraryDependencies += "org.elasticsearch" % "elasticsearch-hadoop" % "2.2.0" % "compile"
resolvers ++= Seq("clojars" at "https://clojars.org/repo",
"conjars" at "http://conjars.org/repo",
"plugins" at "http://repo.spring.io/plugins-release",
"sonatype" at "http://oss.sonatype.org/content/groups/public/")https://stackoverflow.com/questions/36816313
复制相似问题