我有一个表示图的边的dataframe;这是模式:
root |-- src: string (nullable = true)
|-- dst: string (nullable = true)
|-- relationship: struct (nullable = false)
| |-- business_id: string (nullable = true)
| |-- normalized_influence: double (nullable = true)我想将它转换为RDDEdge,以便使用Pregel,而我的困难在于属性“关系”。怎样才能转换它?
发布于 2017-09-24 14:39:18
Edge是一个参数化类。这意味着,除了源ids和目标ids之外,您可以在每个边缘存储任何您喜欢的东西。在您的例子中,它可能是一个Edge[Relationship]。您可以使用case类同时映射数据格式和RDD[Edge[Relationship]]。
import scala.util.hashing.MurmurHash3
case class Relationship(business_id: String, normalized_influence: Double)
case class MyEdge(src: String, dst: String, relationship: Relationship)
val edges: RDD[Edge[Relationship]] = df.as[MyEdge].rdd.map { edge =>
Edge(
MurmurHash3.stringHash(edge.src).toLong, // VertexId type is a Long, so we need to hash your string
MurmurHash3.stringHash(edge.dst).toLong,
edge.relationship
)
} https://stackoverflow.com/questions/46388953
复制相似问题