我使用函数来生成一个列表N对值。
然后对这些值进行映射,以在每个用户之间生成一个距离度量:
val cartesianUsers: org.apache.spark.rdd.RDD[(distance.classes.User, distance.classes.User)] = users.cartesian(users)
cartesianUsers.map(m => manDistance(m._1, m._2))这如预期的那样起作用。
使用火花流库,我创建了一个DStream,然后映射到它之上:
val customReceiverStream: ReceiverInputDStream[String] = ssc.receiverStream....
customReceiverStream.foreachRDD(m => {
println("size is " + m)
})我可以在customReceiverStream.foreachRDD中使用笛卡儿函数,但是根据http://spark.apache.org/docs/1.2.0/streaming-programming-guide.htm文档,这不是它的预期用途:
foreachRDD(func)应用函数的最通用的输出操作符func, to each RDD generated from the stream. This function should push the data in each RDD to a external system, like saving the RDD to files, or writing it over the network to a database. Note that the function func is executed in the driver process running the streaming application, and will usually have RDD actions in it that will force the computation of the streaming RDDs.
如何计算DStream的笛卡儿?也许我误解了DStreams的使用?
发布于 2015-03-13 16:11:20
我不知道转换方法:
cartesianUsers.transform(car => car.cartesian(car))好听的谈话,其中也提到了转换功能在大约17:00 https://www.youtube.com/watch?v=g171ndOHgJ0
https://stackoverflow.com/questions/29034825
复制相似问题