作为Spark的新手,我正在做一些事情,并面临着困难。任何线索都会有帮助。我正在尝试从我拥有的数据帧创建一个JSON,但是toJSON函数不能帮助我解决这个问题。因此,我的输出数据框如下所示:
+---------+------------------+-------------------------+
|booking_id| status |count(status)|
+---------+------------------+-------------------------+
| 132 | rent count. | 6|
| 132 | rent booked | 24|
| 132 | rent delayed | 6|
| 134 | rent booked | 34|
| 134 | rent delayed. | 21|我正在寻找的输出是一个dataframe,它将包含预订id和状态及其作为Json的计数。
+---------+-------------------------------------------+
|booking_id| status_json
+---------+-------------------------------------------+
| 132 | { "rent count": 6, "rent booked": 24, "rent delayed": 6}
| 134 | { "rent booked": 34, "rent delayed": 21}提前谢谢。
发布于 2020-06-15 11:55:19
对于Spark2.4,,请使用map_from_arrays.
from pyspark.sql import functions as F
df.groupBy("booking_id").agg(F.to_json(F.map_from_arrays(F.collect_list("status"),F.collect_list("count(status)")))\
.alias("status_json"))\
.show(truncate=False)
#+----------+--------------------------------------------------+
#|booking_id|status_json |
#+----------+--------------------------------------------------+
#|132 |{"rent count":6,"rent booked":24,"rent delayed":6}|
#|134 |{"rent booked":34,"rent delayed":21} |
#+----------+--------------------------------------------------+发布于 2020-06-15 13:17:19
val sourceDF = Seq(
(132, "rent count", 6),
(132, "rent booked", 24),
(132, "rent delayed", 6),
(134, "rent booked", 34),
(134, "rent delayed", 21)
).toDF("booking_id", "status", "count(status)")
val resDF = sourceDF
.groupBy("booking_id")
.agg(to_json(collect_list(map(col("status"), col("count(status)")))).alias("status_json"))
// +----------+--------------------------------------------------------+
// |booking_id|status_json |
// +----------+--------------------------------------------------------+
// |132 |[{"rent count":6},{"rent booked":24},{"rent delayed":6}]|
// |134 |[{"rent booked":34},{"rent delayed":21}] |
// +----------+--------------------------------------------------------+https://stackoverflow.com/questions/62380879
复制相似问题