这是我的密码:
%spark.pyspark
df_principalBody = spark.sql("""
SELECT
gtin
, principalBodyConstituents
--, principalBodyConstituents.coatings.materialType.value
FROM
v_df_source""")
df_principalBody.createOrReplaceTempView("v_df_principalBody")
df_principalBody.collect();这是输出:
[Row(gtin='7617014161936', principalBodyConstituents=[Row(coatings=[Row(materialType=Row(value='003', valueRange='405')如何以关系格式读取值和valueRange字段?我试着用爆炸和扁平,但它是行不通的。
我的json的一部分:
{
"gtin": "7617014161936",
"timePeriods": [
{
"fractionData": {
"principalBody": {
"constituents": [
{
"coatings": [
{
"materialType": {
"value": "003",
"valueRange": "405"
},
"percentage": 0.1
}
],
...发布于 2022-03-03 20:52:47
您可以使用data_dict.items()列出键/值对。
我用了你的json的一部分如下-
str1 = """{"gtin": "7617014161936","timePeriods": [{"fractionData": {"principalBody": {"constituents": [{"coatings": [
{
"materialType": {
"value": "003",
"valueRange": "405"
},
"percentage": 0.1
}
]}]}}}]}"""import json
res = json.loads(str1)
res_dict = res['timePeriods'][0]['fractionData']['principalBody']['constituents'][0]['coatings'][0]['materialType']
df = spark.createDataFrame(data=res_dict.items())输出:
+----------+---+
| _1| _2|
+----------+---+
| value|003|
|valueRange|405|
+----------+---+您甚至可以指定架构:
from pyspark.sql.types import *
df = spark.createDataFrame(res_dict.items(),
schema=StructType(fields=[
StructField("key", StringType()),
StructField("value", StringType())])).show()导致
+----------+-----+
| key|value|
+----------+-----+
| value| 003|
|valueRange| 405|
+----------+-----+https://stackoverflow.com/questions/71321118
复制相似问题