首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在Scala中使用结构数组扁平化+ (~self-join) spark数据帧

在Scala中使用结构数组扁平化+ (~self-join) spark数据帧
EN

Stack Overflow用户
提问于 2021-03-11 01:49:23
回答 2查看 52关注 0票数 0

输入数据-帧:

代码语言:javascript
复制
{
  "F1" : "A",
  "F2" : "B",
  "F3" : [
            {
              "name" : "N1",
              "sf1" : "val_1",
              "sf2" : "val_2"
            },
            {
              "name" : "N2",
              "sf1" : "val_3",
              "sf2" : "val_4"
            }
         ],
  "F4" : {
        "SF1" : "val_5",
        "SF2" : "val_6",
        "SF3" : "val_7"
  }
}

所需输出:

代码语言:javascript
复制
[
  {
    "F1" : "A",
    "F2" : "B",

    "F3_name" : "N1",
    "F3_sf1" : "val_1",
    "F3_sf2" : "val_2",
    
    "F4_SF1" : "val_7",
    "F4_SF2" : "val_8",
    "F4_SF3" : "val_9",
  },
  {
    "F1" : "A",
    "F2" : "B",

    "F3_name" : "N2",
    "F3_sf1" : "val_3",
    "F3_sf2" : "val_4",
    
    "F4_SF1" : "val_7",
    "F4_SF2" : "val_8",
    "F4_SF3" : "val_9",
  }
]

F3是一个结构数组。新的数据框应该是扁平的,并基于F3中的项数将这一单行转换为一行或多行(在本例中为2行)。

我是Spark & Scala的新手。任何关于如何实现这种转变的想法都将是非常有帮助的。

谢谢!

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-03-11 03:25:41

您也可以首先使用explode。然后,您可以使用一系列别名(例如,$"F3.name" as "F3_name")提取并重命名字段:

代码语言:javascript
复制
scala> case class NameSF(name: String, sf1: String, sf2: String)
defined class NameSF

scala> case class SF(SF1: String, SF2: String, SF3: String)
defined class SF

scala> case class F(F1: String, F2: String, F3: Array[NameSF], F4: SF)
defined class F

scala> val elem = F("A",
     |              "B",
     |              Array(NameSF("N1", "val_1", "val_2"), NameSF("N2", "val_3", "val_4")),
     |              SF("val_5", "val_6", "val_7"))
elem: F = F(A,B,[LNameSF;@2939bfa0,SF(val_5,val_6,val_7))

scala> val df = spark.createDataset(Seq(elem)).toDF
df: org.apache.spark.sql.DataFrame = [F1: string, F2: string ... 2 more fields]

scala> df.withColumn("F3", explode($"F3")).select($"F1",
     |                                            $"F2",
     |                                            $"F3.name" as "F3_name",
     |                                            $"F3.sf1" as "F3_sf1",
     |                                            $"F3.sf2" as "F3_sf2",
     |                                            $"F4.SF1" as "F4_SF1",
     |                                            $"F4.SF2" as "F4_SF2",
     |                                            $"F4.SF3" as "F4_SF3").show
+---+---+-------+------+------+------+------+------+                            
| F1| F2|F3_name|F3_sf1|F3_sf2|F4_SF1|F4_SF2|F4_SF3|
+---+---+-------+------+------+------+------+------+
|  A|  B|     N1| val_1| val_2| val_5| val_6| val_7|
|  A|  B|     N2| val_3| val_4| val_5| val_6| val_7|
+---+---+-------+------+------+------+------+------+
票数 1
EN

Stack Overflow用户

发布于 2021-03-11 01:55:28

您可以使用inline分解和展开F3,使用*展开F4:

代码语言:javascript
复制
val df2 = df.selectExpr("F1","F2","inline(F3)","F4.*")

df2.show
+---+---+----+-----+-----+-----+-----+-----+
| F1| F2|name|  sf1|  sf2|  SF1|  SF2|  SF3|
+---+---+----+-----+-----+-----+-----+-----+
|  A|  B|  N1|val_1|val_2|val_5|val_6|val_7|
|  A|  B|  N2|val_3|val_4|val_5|val_6|val_7|
+---+---+----+-----+-----+-----+-----+-----+
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66570205

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档