首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >有条件地在火花放电中添加新列

有条件地在火花放电中添加新列
EN

Stack Overflow用户
提问于 2021-12-27 18:43:11
回答 1查看 56关注 0票数 -1

我有一些来自json的示例数据,这些数据看起来类似于以下内容:

代码语言:javascript
复制
{ hero: axe, attribute: strength, active_abilities: [q, w, r], inactive_abilities: e }
{ hero: invoker, attribute: intelligence, active_abilities: [q, w, e, r, f, d], inactive_abilities: null }
{ hero: phantom assassin, attribute: agility, active_abilities: [q, w, e], inactive_abilities: r }
{ hero: life stealer, attribute: strength, active_abilities: [q, r], inactive_abilities: [w, e] }

我遇到的问题是,由于该列中数据类型的可变性,“inactive_abilities”列被读取为字符串。数据可能为null、单个字符串(如果只有一个能力)、一个数组(如果有多个能力)。最后,我希望有几个新的列,这些列都是基于“非活动能力”的数量。如果只有1或空能力,我想要一个新列inactive_ability,它只在有一个非活动能力时填充,如果没有或有多个非活动能力,则为null。那么我想要多个专栏,比如inactive_ability1,inactive_ability2,inactive_ability3等等.在数组持有>1值的情况下。因此,从上面的示例中,最终结果应该如下所示:

代码语言:javascript
复制
{ hero: axe, attribute: strength, active_abilities: [q, w, r], inactive_abilities: e , inactive_ability: e, inactive_ability1: null, inactive_ability2: null, inactive_ability3, null, inactive_ability4: null}
{ hero: invoker, attribute: intelligence, active_abilities: [q, w, e, r, f, d], inactive_abilities: null, inactive_ability: null, inactive_ability1: null, inactive_ability2: null, inactive_ability3, null, inactive_ability4: null }
{ hero: phantom assassin, attribute: agility, active_abilities: [q, w, e], inactive_abilities: r, inactive_ability: r, inactive_ability1: null, inactive_ability2: null, inactive_ability3, null, inactive_ability4: null }
{ hero: life stealer, attribute: strength, active_abilities: [q, r], inactive_abilities: [w, e], inactive_ability: null, inactive_ability1: w, inactive_ability2: e, inactive_ability3, null, inactive_ability4: null }

我不能假设'inactive_abilities‘的数量是固定的,但是如果它超过了4,其余的就可以忽略了。我遇到的问题是能够将字段转换为数组,并在适当的情况下读取它,然后根据上述条件创建和填充新列。

EN

回答 1

Stack Overflow用户

发布于 2021-12-27 20:14:49

字符串inactive_abilities列可以通过删除[]并通过,拆分字符串而转换为数组。

代码语言:javascript
复制
from pyspark.sql import functions as F

data = [{"hero": "axe", "attribute": "strength", "active_abilities": ["q", "w", "r"], "inactive_abilities": "e"},
        {"hero": "invoker", "attribute": "intelligence", "active_abilities": ["q", "w", "e", "r", "f", "d"],
         "inactive_abilities": None},
        {"hero": "phantom assassin", "attribute": "agility", "active_abilities": ["q", "w", "e"],
         "inactive_abilities": "r"},
        {"hero": "life stealer", "attribute": "strength", "active_abilities": ["q", "r"],
         "inactive_abilities": "[w, e]"}, ]

df = spark.createDataFrame(data)

array_select_expr = [
    F.when(F.size("parsed_inactive_abilities") > 1, F.trim(F.col("parsed_inactive_abilities")[i])).alias(
        f"inactive_ability{i}") if i > 0 else F.when(
        F.size("parsed_inactive_abilities") == 1, F.trim(F.col("parsed_inactive_abilities")[0])).alias(
        "inactive_abilities")
    for i in range(0, 5)]

(df.withColumn("parsed_inactive_abilities", F.split(F.regexp_replace(F.col("inactive_abilities"), "[\[\]]", ""), ","))
   .select("*", *array_select_expr)
   .drop("parsed_inactive_abilities").show())

输出

代码语言:javascript
复制
+------------------+------------+----------------+------------------+------------------+-----------------+-----------------+-----------------+-----------------+
|  active_abilities|   attribute|            hero|inactive_abilities|inactive_abilities|inactive_ability1|inactive_ability2|inactive_ability3|inactive_ability4|
+------------------+------------+----------------+------------------+------------------+-----------------+-----------------+-----------------+-----------------+
|         [q, w, r]|    strength|             axe|                 e|                 e|             null|             null|             null|             null|
|[q, w, e, r, f, d]|intelligence|         invoker|              null|              null|             null|             null|             null|             null|
|         [q, w, e]|     agility|phantom assassin|                 r|                 r|             null|             null|             null|             null|
|            [q, r]|    strength|    life stealer|            [w, e]|              null|                e|             null|             null|             null|
+------------------+------------+----------------+------------------+------------------+-----------------+-----------------+-----------------+-----------------+


# Value in array_select_expr

[Column<'CASE WHEN (size(parsed_inactive_abilities) = 1) THEN trim(parsed_inactive_abilities[0]) END AS inactive_abilities'>,
 Column<'CASE WHEN (size(parsed_inactive_abilities) > 1) THEN trim(parsed_inactive_abilities[1]) END AS inactive_ability1'>,
 Column<'CASE WHEN (size(parsed_inactive_abilities) > 1) THEN trim(parsed_inactive_abilities[2]) END AS inactive_ability2'>,
 Column<'CASE WHEN (size(parsed_inactive_abilities) > 1) THEN trim(parsed_inactive_abilities[3]) END AS inactive_ability3'>,
 Column<'CASE WHEN (size(parsed_inactive_abilities) > 1) THEN trim(parsed_inactive_abilities[4]) END AS inactive_ability4'>]
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/70499270

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档