专家们,
我们有一个从加载到表中的JSON数据推断模式的场景。它必须是动态完成的,而且表中的JSON数据将具有不同的模式。
示例:
row 1-> address <array>[ id string ,name string ]
row 2-> address<array> [addr<object> {id:"1",name:"abc"}]
row 3-> address<array> [addr<object> {id:"2",name:"dfg",Zips<array>[zip1:6009,zip2:789]}]我知道我们可以使用横向扁平和递归来推断模式。然而,当我们需要切碎上面的数据时,我们需要扁平查询,如下所示。
LATERAL FLATTEN (jsondata:address ,recursive =>true) a
LATERAL FLATTEN (a.value:addr,recursive =>true) b -> this is causing issue
LATERAL FLATTEN (c.value:Zips,recursive =>true) c 当我们展平对象数据类型时,它被展平到元素级别,有没有一种方法来检查和动态避免对象展平。
致以问候,戈壁
发布于 2020-06-27 16:05:25
Snowflake的semi-structured data query特性提供了data-type inspection functions,可用于在单个表列中有条件地处理这种不同的输入。
特别是,在分解outer arrays into whole rows之后,您可以使用IS_ARRAY、IS_OBJECT和: operator (带有NULL结果检查)函数来分离记录生成逻辑,然后使用UNION ALL将行合并到单个输出中。
问题lacks a clear/usable sample or schema of data (and an expected output)所以我对数据的外观做了四个假设,并从根开始为每种类型添加了一个过滤器。大体思路保持不变(检查类型,划分数据集,处理每种类型),你应该能够推断和调整。
WITH tbl AS (
-- Sample table data
select parse_json('{"address": [["sa_id1", "sa_name1"], ["sa_id2", "sa_name2"]], "other_outer_field": 1}') jsondata
union all
select parse_json('{"address": [{"id": "sr_id1", "name": "sr_name1"}, {"id": "sr_id2", "name": "sr_name2"}], "other_outer_field": 2}') jsondata
union all
select parse_json('{"address": [{"id": "zr_id1", "name": "zr_name1", "zips": ["10001", "10002", "10003"]}, {"id": "zr_id2", "name": "zr_name2", "zips": ["20001", "20002"]}], "other_outer_field": 3}') jsondata
union all
select parse_json('{"address": {"id": "zr_id1", "name": "zr_name1", "zips": ["10001", "10002", "10003"]}, "other_outer_field": 4}') jsondata
), all_address_array_formats AS (
-- Table's actual row: { …, "address": [ … ], … } when the address field is an array
SELECT
jsondata:other_outer_field AS o_f,
each_address.value AS address_container
FROM tbl, LATERAL FLATTEN(jsondata:address) each_address
WHERE IS_ARRAY(jsondata:address)
), all_address_object_formats AS (
-- Table's actual row: { …, "address": { … }, … } when the address field is an object (we do not need to flatten here)
SELECT
jsondata:other_outer_field AS o_f,
jsondata:address AS address_container
FROM tbl
WHERE IS_OBJECT(jsondata:address)
), just_array_members AS (
-- For address array with nested arrays: [ [id1, name1], [id2, name2], … ]
SELECT
o_f,
address_container[0]::varchar AS id,
address_container[1]::varchar AS name,
NULL AS zipcode
FROM all_address_array_formats
WHERE
IS_ARRAY(address_container)
), simple_record_members AS (
-- For address array with objects, but no zipcode fields: [ { id: id1, name: name1 }, { id: id2, name: name2 }, … ]
SELECT
o_f,
address_container:id::varchar AS id,
address_container:name::varchar AS name,
NULL AS zipcode
FROM all_address_array_formats
WHERE
IS_OBJECT(address_container)
AND address_container:zips IS NULL
), zipcode_record_members AS (
-- For address array with objects, each with multiple zipcodes: [ { id: id1, name: name1, zips: [ zip1_1, zip1_2, … ] }, { id: id2, name: name2, zips: [zip2_1, zip2_2, …] }, … ]
SELECT
o_f,
address_container:id::varchar AS id,
address_container:name::varchar AS name,
per_zip.value::varchar AS zipcode
FROM all_address_array_formats, LATERAL FLATTEN(address_container:zips) per_zip
WHERE
IS_OBJECT(address_container)
AND address_container:zips IS NOT NULL
), zipcodes_within_object AS (
-- For address of object type, a single one with multiple zipcodes: { id: id1, name: name1, zips: [ zip1_1, zip1_2, … ] }
SELECT
o_f,
address_container:id::varchar AS id,
address_container:name::varchar AS name,
per_zip.value::varchar AS zipcode
FROM all_address_object_formats, LATERAL FLATTEN(address_container:zips) per_zip
WHERE
IS_OBJECT(address_container)
AND address_container:zips IS NOT NULL
)
SELECT o_f, id, name, zipcode FROM just_array_members UNION ALL
SELECT o_f, id, name, zipcode FROM simple_record_members UNION ALL
SELECT o_f, id, name, zipcode FROM zipcode_record_members UNION ALL
SELECT o_f, id, name, zipcode FROM zipcodes_within_object;列注:该示例还展示了如何继续携带表中原始对象(:)中除address之外的任何其他字段,这些字段未通过flatten函数分解。
对于输入:
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| JSONDATA |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| {"address": [["sa_id1", "sa_name1"], ["sa_id2", "sa_name2"]], "other_outer_field": 1} |
| {"address": [{"id": "sr_id1", "name": "sr_name1"}, {"id": "sr_id2", "name": "sr_name2"}], "other_outer_field": 2} |
| {"address": [{"id": "zr_id1", "name": "zr_name1", "zips": ["10001", "10002", "10003"]}, {"id": "zr_id2", "name": "zr_name2", "zips": ["20001", "20002"]}], "other_outer_field": 3} |
| {"address": {"id": "zr_id1", "name": "zr_name1", "zips": ["10001", "10002", "10003"]}, "other_outer_field": 4} |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+这会产生以下结果:
+-----+--------+----------+---------+
| O_F | ID | NAME | ZIPCODE |
|-----+--------+----------+---------|
| 1 | sa_id1 | sa_name1 | NULL |
| 1 | sa_id2 | sa_name2 | NULL |
| 2 | sr_id1 | sr_name1 | NULL |
| 2 | sr_id2 | sr_name2 | NULL |
| 3 | zr_id1 | zr_name1 | 10001 |
| 3 | zr_id1 | zr_name1 | 10002 |
| 3 | zr_id1 | zr_name1 | 10003 |
| 3 | zr_id2 | zr_name2 | 20001 |
| 3 | zr_id2 | zr_name2 | 20002 |
| 4 | zr_id1 | zr_name1 | 10001 |
| 4 | zr_id1 | zr_name1 | 10002 |
| 4 | zr_id1 | zr_name1 | 10003 |
+-----+--------+----------+---------+https://stackoverflow.com/questions/62604149
复制相似问题