使用combinebyKey时,得到类型不匹配错误如下所示
scala> rdd.map(x => (x._1, x._2))
.combineByKey( (x: Int) => x,
(acc: SortedSet[Int], x: Int) => (acc += x),
(acc1: SortedSet[Int], acc2: SortedSet[Int]) => (acc1 ++= acc2))
<console>:29: error: type mismatch;
found : (scala.collection.mutable.SortedSet[Int], Int) => scala.collection.mutable.SortedSet[Int]
required: (Any, Int) => Any
rdd.map(x => (x._1, x._2)).combineByKey( (x: Int) => x, (acc: SortedSet[Int], x: Int) => (acc += x), (acc1: SortedSet[Int], acc2: SortedSet[Int]) => (acc1 ++= acc2))
^
<console>:29: error: type mismatch;
found : (scala.collection.mutable.SortedSet[Int], scala.collection.mutable.SortedSet[Int]) => scala.collection.mutable.SortedSet[Int]
required: (Any, Any) => Any
rdd.map(x => (x._1, x._2)).combineByKey( (x: Int) => x, (acc: SortedSet[Int], x: Int) => (acc += x), (acc1: SortedSet[Int], acc2: SortedSet[Int]) => (acc1 ++= acc2))为什么scala不能将scala.collection.mutable.SortedSet[Int]视为Any
下面是我尝试过的代码:
import scala.collection.mutable.SortedSet
val data = Array((1, 1, 1),
(1, 1, 2),
(1, 1, 3),
(1, 2, 1),
(1, 2, 2),
(1, 2, 3),
(2, 1, 1),
(2, 1, 2),
(2, 1, 3),
(2, 2, 1),
(2, 2, 2),
(2, 2, 3))
val rdd = sc.parallelize(data)
rdd.map(x => (x._1, x._2))
.combineByKey( (x: Int) => x,
(acc: SortedSet[Int], x: Int) => (acc += x),
(acc1: SortedSet[Int], acc2: SortedSet[Int]) => (acc1 ++= acc2))我希望得到((1,(1,2),(2,(1,2)),键/值对中的值不包含重复的元素。
发布于 2019-04-01 05:58:44
第一个函数的返回类型需要一个排序集,set需要知道如何构造组合器。像这样的东西应该能起作用
rdd.map(x => (x._1, x._2)).combineByKey(
(x: Int) => new mutable.TreeSet[Int] += x,
(acc: SortedSet[Int], x: Int) => (acc += x),
(acc1: SortedSet[Int], acc2: SortedSet[Int]) => (acc1 ++= acc2))https://stackoverflow.com/questions/55448759
复制相似问题