文章/答案/技术大牛

发布

社区首页 >问答首页 >通过迭代嵌套字典中直到第n层的值来创建数据帧

问通过迭代嵌套字典中直到第n层的值来创建数据帧
EN

Stack Overflow用户

提问于 2020-09-05 02:58:07

回答 1查看 45关注 0票数 0

我有一个json文件从这个链接/网站human diseased icd-11 classification下载，这个数据有一个高达8层的嵌套，例如：

    "name":"br08403",
    "children":[
    {
        "name":"01 Certain infectious or parasitic diseases",
        "children":[
        {
            "name":"Gastroenteritis or colitis of infectious origin",
            "children":[
            {
                "name":"Bacterial intestinal infections",
                "children":[
                {
                    "name":"1A00  Cholera",
                    "children":[
                    {
                        "name":"H00110  Cholera"
                    }

我尝试使用以下代码：

def flatten_json(nested_json):
    """
        Flatten json object with nested keys into a single level.
        Args:
            nested_json: A nested json object.
        Returns:
            The flattened json object if successful, None otherwise.
    """
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(nested_json)
    return out
df2 = pd.Series(flatten_json(dictionary)).to_frame()

我得到的输出是：

name    br08403
children_0_name 01 Certain infectious or parasitic diseases
children_0_children_0_name  Gastroenteritis or colitis of infectious origin
children_0_children_0_children_0_name   Bacterial intestinal infections
children_0_children_0_children_0_children_0_name    1A00 Cholera
... ...
children_21_children_17_children_10_name    NF0A Certain early complications of trauma, n...
children_21_children_17_children_11_name    NF0Y Other specified effects of external causes
children_21_children_17_children_12_name    NF0Z Unspecified effects of external causes
children_21_children_18_name    NF2Y Other specified injury, poisoning or cer...
children_21_children_19_name    NF2Z Unspecified injury, poisoning or certain..

但所需的输出是包含8列的数据帧，它可以容纳嵌套名称键的最后一个深度，例如：

如果有任何帮助，我将不胜感激

通过创建数据帧尝试提取“name”属性的代码，如下所示：

with open('br08403.json') as f:
    d = json.load(f)
df2 = pd.DataFrame(d)

data = []
for a in range(len(df2)):
#     print(df2['children'][a]['name'])
    data.append(df2['children'][a]['name'])
    for b in range(len(df2['children'][a]['children'])):
#         print(df2['children'][a]['children'][b]['name'])
        data.append(df2['children'][a]['children'][b]['name'])
        if len(df2['children'][a]['children'][b]) < 2:
            print(df2['children'][a]['children'][b]['name'])
        else:
            for c in range(len(df2['children'][a]['children'][b]['children'])):
#                 print(df2['children'][a]['children'][b]['children'][c]['name'])
                data.append(df2['children'][a]['children'][b]['children'][c]['name'])
                if len(df2['children'][a]['children'][b]['children'][c]) < 2:
                    print(df2['children'][a]['children'][b]['children'][c]['name'])
                else:
                    for d in range(len(df2['children'][a]['children'][b]['children'][c]['children'])):
#                         print(df2['children'][a]['children'][b]['children'][c]['children'][d]['name'])
                        data.append(df2['children'][a]['children'][b]['children'][c]['children'][d]['name'])

但我得到一个简单的列表，如下所示：

['01 Certain infectious or parasitic diseases',
 'Gastroenteritis or colitis of infectious origin',
 'Bacterial intestinal infections',
 '1A00  Cholera',
 '1A01  Intestinal infection due to other Vibrio',
 '1A02  Intestinal infections due to Shigella',
 '1A03  Intestinal infections due to Escherichia coli',
 '1A04  Enterocolitis due to Clostridium difficile',
 '1A05  Intestinal infections due to Yersinia enterocolitica',
 '1A06  Gastroenteritis due to Campylobacter',
 '1A07  Typhoid fever',
 '1A08  Paratyphoid Fever',
 '1A09  Infections due to other Salmonella',....

pandas

dictionary

python

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-09-09 02:43:54

一个简单的仅限pandas的迭代方法。

res = requests.get("https://www.genome.jp/kegg-bin/download_htext?htext=br08403.keg&format=json&filedir=")
js = res.json()

df = pd.json_normalize(js)
for i in range(20):
    df = pd.json_normalize(df.explode("children").to_dict(orient="records"))
    if "children" in df.columns: df.drop(columns="children", inplace=True)
    df = df.rename(columns={"children.name":f"level{i}","children.children":"children"})
    if df[f"level{i}"].isna().all() or "children" not in df.columns: break

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/63746691

复制

相似问题

问通过迭代嵌套字典中直到第n层的值来创建数据帧
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问通过迭代嵌套字典中直到第n层的值来创建数据帧EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问通过迭代嵌套字典中直到第n层的值来创建数据帧
EN