首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何根据条件在火花数据中获取值

如何根据条件在火花数据中获取值
EN

Stack Overflow用户
提问于 2020-05-02 08:26:14
回答 2查看 735关注 0票数 0

我有一个类似于DataFrame的below.The列值是格式- 1_3_del_8_3,它基本上是由"_ del_“分隔的两个值。这里我们将得到两个部分- 1_38_3。我们可以在这里略去第二部分。我正在寻找获得第一部分不是_的最后一列值的最佳方法。

另外,我在DF中有50+列。

样本DataFrame

代码语言:javascript
复制
+-----------+-----------+-----------+
|         c1|         c2|         c3|
+-----------+-----------+-----------+
|1_3_del_8_3|2_3_del_6_3|0_0_del_8_3|
|2_9_del_4_3|0_0_del_4_3|2_5_del_4_3|
|2_8_del_4_3|0_0_del_4_3|0_0_del_4_3|
|5_3_del_4_3|2_3_del_4_3|7_3_del_4_3|
+-----------+-----------+-----------+

预期结果

代码语言:javascript
复制
+-----------+-----------+-----------+-----------+
|         c1|         c2|         c3|C4Out      |
+-----------+-----------+-----------+-----------+
|1_3_del_8_3|2_3_del_6_3|0_0_del_8_3|   2_3     |
|2_9_del_4_3|0_0_del_4_3|2_5_del_4_3|   2_5     |
|2_8_del_4_3|0_0_del_4_3|0_0_del_4_3|   2_8     |
|5_3_del_4_3|2_3_del_4_3|7_3_del_4_3|   7_3     |
+-----------+-----------+-----------+-----------+

我的选拔-

代码语言:javascript
复制
val df1=Seq(
("1_3_del_8_3","2_3_del_6_3","0_0_del_8_3"),
("2_9_del_4_3","0_0_del_4_3","2_5_del_4_3"),
("2_8_del_4_3","0_0_del_4_3","0_0_del_4_3"),
("5_3_del_4_3","2_3_del_4_3","7_3_del_4_3")
)toDF("c1","c2","c3")

然后我试着根据分隔符分割列,然后我被困在如何处理的问题上,我尝试了很多搜索,但在这里得到了一个小小的问题。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-05-02 11:35:45

检查下面的代码,以获得相同的结果在动态。

Logic

Step 1:将所有列的第一个值与空格合并,但包含"0_0“的列除外。

例如,以第一行- |1_3_del_8_3|2_3_del_6_3|0__del_8_3|合并除0_0以外的所有列的第一个值和空格1_3 2_3

使用空格- Step 2:拆分值- Array(1_3,2_3)

Step 3:从数组中获取最后一个值。- 2_3是第一行的值。

对于剩余的行,请遵循相同的方式。请检查下面的代码。

代码语言:javascript
复制
scala> val df = Seq(("1_3_del_8_3","2_3_del_6_3","0_0_del_8_3"),("2_9_del_4_3","0_0_del_4_3","2_5_del_4_3"),("2_8_del_4_3","0_0_del_4_3","0_0_del_4_3"),("5_3_del_4_3","2_3_del_4_3","7_3_del_4_3"))toDF("c1","c2","c3") // Creating DF.
df: org.apache.spark.sql.DataFrame = [c1: string, c2: string ... 1 more field]

scala> val exprs = columns.map(c => when(substring(col(s"${c}"),1,3) =!= "0_0",substring(col(s"${c}"),1,3))) // Creating expressions
exprs: Array[org.apache.spark.sql.Column] = Array(CASE WHEN (NOT (substring(c1, 1, 3) = 0_0)) THEN substring(c1, 1, 3) END, CASE WHEN (NOT (substring(c2, 1, 3) = 0_0)) THEN substring(c2, 1, 3) END, CASE WHEN (NOT (substring(c3, 1, 3) = 0_0)) THEN substring(c3, 1, 3) END)

scala> val array = (split(concat_ws(" ",exprs:_*)," ")) // Extracting Arrays
array: org.apache.spark.sql.Column = split(concat_ws( , CASE WHEN (NOT (substring(c1, 1, 3) = 0_0)) THEN substring(c1, 1, 3) END, CASE WHEN (NOT (substring(c2, 1, 3) = 0_0)) THEN substring(c2, 1, 3) END, CASE WHEN (NOT (substring(c3, 1, 3) = 0_0)) THEN substring(c3, 1, 3) END),  )

scala> val element = array((size(array)-1).cast("int")) // Extracting matched element
element: org.apache.spark.sql.Column = split(concat_ws( , CASE WHEN (NOT (substring(c1, 1, 3) = 0_0)) THEN substring(c1, 1, 3) END, CASE WHEN (NOT (substring(c2, 1, 3) = 0_0)) THEN substring(c2, 1, 3) END, CASE WHEN (NOT (substring(c3, 1, 3) = 0_0)) THEN substring(c3, 1, 3) END),  )[CAST((size(split(concat_ws( , CASE WHEN (NOT (substring(c1, 1, 3) = 0_0)) THEN substring(c1, 1, 3) END, CASE WHEN (NOT (substring(c2, 1, 3) = 0_0)) THEN substring(c2, 1, 3) END, CASE WHEN (NOT (substring(c3, 1, 3) = 0_0)) THEN substring(c3, 1, 3) END),  )) - 1) AS INT)]

scala> spark.time{ df.withColumn("c4",element).show} // showing final result.
+-----------+-----------+-----------+---+
|         c1|         c2|         c3| c4|
+-----------+-----------+-----------+---+
|1_3_del_8_3|2_3_del_6_3|0_0_del_8_3|2_3|
|2_9_del_4_3|0_0_del_4_3|2_5_del_4_3|2_5|
|2_8_del_4_3|0_0_del_4_3|0_0_del_4_3|2_8|
|5_3_del_4_3|2_3_del_4_3|7_3_del_4_3|7_3|
+-----------+-----------+-----------+---+

Time taken: 20 ms
票数 1
EN

Stack Overflow用户

发布于 2020-05-02 09:21:04

您可以使用when()来有条件地检查用例:

when(Condition,value_if_condition_true).otherwise(value_if_condition_false)

根据您的用例,我们可以从右侧开始检查。如果c3满足条件,则将选择来自c3的值,否则它将签入c2,否则c1为null。

代码语言:javascript
复制
scala> df.show()
+-----------+-----------+-----------+
|         c1|         c2|         c3|
+-----------+-----------+-----------+
|1_3_del_8_3|2_3_del_6_3|0_0_del_8_3|
|2_9_del_4_3|0_0_del_4_3|2_5_del_4_3|
|2_8_del_4_3|0_0_del_4_3|0_0_del_4_3|
|5_3_del_4_3|2_3_del_4_3|7_3_del_4_3|
+-----------+-----------+-----------+

scala> df.withColumn("C4Out",
when(substring($"c3",1,3)!=="0_0",substring($"c3",9,11))
.when(substring($"c2",1,3)!=="0_0",substring($"c2",9,11))
.when(substring($"c1",1,3)!=="0_0",substring($"c1",9,11))
.otherwise(null)).show()
+-----------+-----------+-----------+-----------+
|         c1|         c2|         c3|C4Out      |
+-----------+-----------+-----------+-----------+
|1_3_del_8_3|2_3_del_6_3|0_0_del_8_3|   2_3     |
|2_9_del_4_3|0_0_del_4_3|2_5_del_4_3|   2_5     |
|2_8_del_4_3|0_0_del_4_3|0_0_del_4_3|   2_8     |
|5_3_del_4_3|2_3_del_4_3|7_3_del_4_3|   7_3     |
+-----------+-----------+-----------+-----------+
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61556443

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档