我需要编写代码来生成考试评估报告(给定来自多个学生的答案和他们的尝试),其中包括问题ID(作为列)和答案(作为它们的值)。我需要注意的是,学生(参与者)可以在单个评估和地理标签中回答一个或多个问题。
/** Input data */
val inputDf = Seq(
(1, "Question1Text", "Yes", "abcde1", 0, List("x1", "y1")),
(2, "Question2Text", "No", "abcde1", 0, List("x1", "y1")),
(3, "Question3Text", "3", "abcde1", 0, List("x1", "y1")),
(1, "Question1Text", "No", "abcde2", 0, List("x2", "y2")),
(2, "Question2Text", "Yes", "abcde2", 0, List("x2", "y2"))
).toDF("Qid", "Question", "AnswerText", "ParticipantID", "Assessment", "GeoTag")
println("Input:")
inputDf.show(false)我的解决方案是:
inputDf
.groupBy($"ParticipantID")
.pivot("Question")
.agg(first($"ParticipantID"))
.sort($"ParticipantID")但我得到的是:
Input:
+---+-------------+----------+-------------+----------+--------+
|Qid|Question |AnswerText|ParticipantID|Assessment|GeoTag |
+---+-------------+----------+-------------+----------+--------+
|1 |Question1Text|Yes |abcde1 |0 |[x1, y1]|
|2 |Question2Text|No |abcde1 |0 |[x1, y1]|
|3 |Question3Text|3 |abcde1 |0 |[x1, y1]|
|1 |Question1Text|No |abcde2 |0 |[x2, y2]|
|2 |Question2Text|Yes |abcde2 |0 |[x2, y2]|
+---+-------------+----------+-------------+----------+--------+
Expected:
+-------------+----------+--------+-----+-----+-----+
|ParticipantID|Assessment|GeoTag |Qid_1|Qid_2|Qid_3|
+-------------+----------+--------+-----+-----+-----+
|abcde1 |0 |[x1, y1]|Yes |No |3 |
|abcde2 |0 |[x2, y2]|No |Yes |null |
+-------------+----------+--------+-----+-----+-----+
Actual:
+-------------+-------------+-------------+-------------+
|ParticipantID|Question1Text|Question2Text|Question3Text|
+-------------+-------------+-------------+-------------+
|abcde1 |abcde1 |abcde1 |abcde1 |
|abcde2 |abcde2 |abcde2 |null |
+-------------+-------------+-------------+-------------+发布于 2020-06-15 18:45:09
你应该使用
inputDf
.groupBy($"ParticipantID", $"Assessment", $"GeoTag")
.pivot("Question")
.agg(first($"AnswerText"))
.sort($"ParticipantID")
.show(false)您可以在以后根据需要重命名该列。
输出:
+-------------+----------+--------+-------------+-------------+-------------+
|ParticipantID|Assessment|GeoTag |Question1Text|Question2Text|Question3Text|
+-------------+----------+--------+-------------+-------------+-------------+
|abcde1 |0 |[x1, y1]|Yes |No |3 |
|abcde2 |0 |[x2, y2]|No |Yes |null |
+-------------+----------+--------+-------------+-------------+-------------+发布于 2020-06-15 18:47:54
也许这是有用的-
/** Input data */
val inputDf = Seq(
(1, "Question1Text", "Yes", "abcde1", 0, List("x1", "y1")),
(2, "Question2Text", "No", "abcde1", 0, List("x1", "y1")),
(3, "Question3Text", "3", "abcde1", 0, List("x1", "y1")),
(1, "Question1Text", "No", "abcde2", 0, List("x2", "y2")),
(2, "Question2Text", "Yes", "abcde2", 0, List("x2", "y2"))
).toDF("Qid", "Question", "AnswerText", "ParticipantID", "Assessment", "GeoTag")
println("Input:")
inputDf.show(false)
inputDf.printSchema()
/**
* Input:
* +---+-------------+----------+-------------+----------+--------+
* |Qid|Question |AnswerText|ParticipantID|Assessment|GeoTag |
* +---+-------------+----------+-------------+----------+--------+
* |1 |Question1Text|Yes |abcde1 |0 |[x1, y1]|
* |2 |Question2Text|No |abcde1 |0 |[x1, y1]|
* |3 |Question3Text|3 |abcde1 |0 |[x1, y1]|
* |1 |Question1Text|No |abcde2 |0 |[x2, y2]|
* |2 |Question2Text|Yes |abcde2 |0 |[x2, y2]|
* +---+-------------+----------+-------------+----------+--------+
*
* root
* |-- Qid: integer (nullable = false)
* |-- Question: string (nullable = true)
* |-- AnswerText: string (nullable = true)
* |-- ParticipantID: string (nullable = true)
* |-- Assessment: integer (nullable = false)
* |-- GeoTag: array (nullable = true)
* | |-- element: string (containsNull = true)
*/
inputDf
.groupBy($"ParticipantID", $"Assessment", $"GeoTag")
.pivot("Qid")
.agg(
first($"AnswerText").as("Q")
)
.orderBy($"ParticipantID")
.show(false)
/**
* +-------------+----------+--------+---+---+----+
* |ParticipantID|Assessment|GeoTag |1 |2 |3 |
* +-------------+----------+--------+---+---+----+
* |abcde1 |0 |[x1, y1]|Yes|No |3 |
* |abcde2 |0 |[x2, y2]|No |Yes|null|
* +-------------+----------+--------+---+---+----+
*/https://stackoverflow.com/questions/62385976
复制相似问题