文章/答案/技术大牛

发布

社区首页 >问答首页 >电火花数据到嵌套字典的转换

问电火花数据到嵌套字典的转换
EN

Stack Overflow用户

提问于 2022-04-14 04:21:40

回答 1查看 68关注 0票数 0

我有一个，我需要转换它的字典格式如下所示。

这是火星雨的数据

import pyspark
from pyspark.sql import Row



data = spark.createDataFrame([Row(name='harvest bowl', tenure='6+',count=4),
                              Row(name='harvest bowl', tenure='6-Mar',count=1),
                              Row(name='harvest bowl', tenure='2-Jan',count=5),
                              Row(name='fish taco', tenure='6+',count=1)])

data.show()

表产出：

+------------+------+-----+
|        NAME|TENURE|count|
+------------+------+-----+
|harvest bowl|    6+|    4|
|   fish taco|    6+|    1|
|harvest bowl| 6-Mar|    1|
|harvest bowl| 2-Jan|    5|
+------------+------+-----+

我想以下面的格式转换上面的pyspark数据格式。

{'fish taco': {'TENURE': {'6+': 1.0}}, 'harvest bowl': {'TENURE': {'6+': 4, '6-Mar': 1, '2-Jan': 5}}}

有人能告诉我怎么用火花放电来做这件事吗？

pyspark

python

dictionary

回答 1

Stack Overflow用户

发布于 2022-04-14 15:14:18

您可以使用map_from_arrays和collect_list。

pdf = (df.groupby('name')
    .agg(F.map_from_arrays(F.collect_list('tenure'), F.collect_list('count'))
    .alias('tenure'))
    .toPandas())

pdf
#         name                             tenure
# harvest bowl  {'6+': 4, '6-Mar': 1, '2-Jan': 5}
#    fish taco                          {'6+': 1}

然后使用Pandas to_dict获得字典。

pdf.set_index('name').to_dict(orient='index')

# {'harvest bowl': {'tenure': {'6+': 4, '6-Mar': 1, '2-Jan': 5}},
#  'fish taco': {'tenure': {'6+': 1}}}

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/71866483

复制

相似问题

问电火花数据到嵌套字典的转换
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问电火花数据到嵌套字典的转换EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问电火花数据到嵌套字典的转换
EN