新来的数据库。有一个SQL数据库表,我是从它创建数据的。其中一列是JSON字符串。我需要将嵌套的JSON分解成多个列。用这个post和这个post把我带到现在的位置。
示例JSON:
{ "Module": { "PCBA Serial Number": "G7456789", "Manufacturing Designator": "DISNEY", "Firmware Version": "0.0.0", "Hardware Revision": "46858", "Manufacturing Date": "10/17/2018 4:04:25 PM", "Test Result": "Fail", "Test Start Time": "10/22/2018 6:14:14 AM", "Test End Time": "10/22/2018 6:16:11 AM" }
目前为止的代码:
#define schema
schema = StructType(
[
StructField('Module',ArrayType(StructType(Seq
StructField('PCBA Serial Number',StringType,True),
StructField('Manufacturing Designator',StringType,True),
StructField('Firmware Version',StringType,True),
StructField('Hardware Revision',StringType,True),
StructField('Test Result',StringType,True),
StructField('Test Start Time',StringType,True),
StructField('Test End Time',StringType,True))), True) ,True),
StructField('Test Results',StringType(),True),
StructField('HVM Code Errors',StringType(),True)
]
#use from_json to explode json by applying it to column
df.withColumn("ActivityName", from_json("ActivityName", schema))\
.select(col('ActivityName'))\
.show()错误:
SyntaxError: invalid syntax
File "<command-1632344621139040>", line 10
StructField('PCBA Serial Number',StringType,True),
^
SyntaxError: invalid syntax发布于 2020-03-31 02:05:27
由于您使用的是StringType(),所以类型应该是StringType而不是StringType,并删除Seq,用[]替换它
schema = StructType([StructField('Module',ArrayType(StructType([
StructField('PCBA Serial Number',StringType(),True),
StructField('Manufacturing Designator',StringType(),True),
StructField('Firmware Version',StringType(),True),
StructField('Hardware Revision',StringType(),True),
StructField('Test Result',StringType(),True),
StructField('Test Start Time',StringType(),True),
StructField('Test End Time',StringType(),True)])), True),
StructField('Test Results',StringType(),True),
StructField('HVM Code Errors',StringType(),True)])https://stackoverflow.com/questions/60942137
复制相似问题