首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >通过迭代嵌套字典中直到第n层的值来创建数据帧

通过迭代嵌套字典中直到第n层的值来创建数据帧
EN

Stack Overflow用户
提问于 2020-09-05 02:58:07
回答 1查看 45关注 0票数 0

我有一个json文件从这个链接/网站human diseased icd-11 classification下载,这个数据有一个高达8层的嵌套,例如:

代码语言:javascript
复制
    "name":"br08403",
    "children":[
    {
        "name":"01 Certain infectious or parasitic diseases",
        "children":[
        {
            "name":"Gastroenteritis or colitis of infectious origin",
            "children":[
            {
                "name":"Bacterial intestinal infections",
                "children":[
                {
                    "name":"1A00  Cholera",
                    "children":[
                    {
                        "name":"H00110  Cholera"
                    }

我尝试使用以下代码:

代码语言:javascript
复制
def flatten_json(nested_json):
    """
        Flatten json object with nested keys into a single level.
        Args:
            nested_json: A nested json object.
        Returns:
            The flattened json object if successful, None otherwise.
    """
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(nested_json)
    return out
df2 = pd.Series(flatten_json(dictionary)).to_frame()

我得到的输出是:

代码语言:javascript
复制
name    br08403
children_0_name 01 Certain infectious or parasitic diseases
children_0_children_0_name  Gastroenteritis or colitis of infectious origin
children_0_children_0_children_0_name   Bacterial intestinal infections
children_0_children_0_children_0_children_0_name    1A00 Cholera
... ...
children_21_children_17_children_10_name    NF0A Certain early complications of trauma, n...
children_21_children_17_children_11_name    NF0Y Other specified effects of external causes
children_21_children_17_children_12_name    NF0Z Unspecified effects of external causes
children_21_children_18_name    NF2Y Other specified injury, poisoning or cer...
children_21_children_19_name    NF2Z Unspecified injury, poisoning or certain..

但所需的输出是包含8列的数据帧,它可以容纳嵌套名称键的最后一个深度,例如:

如果有任何帮助,我将不胜感激

通过创建数据帧尝试提取“name”属性的代码,如下所示:

代码语言:javascript
复制
with open('br08403.json') as f:
    d = json.load(f)
df2 = pd.DataFrame(d)

data = []
for a in range(len(df2)):
#     print(df2['children'][a]['name'])
    data.append(df2['children'][a]['name'])
    for b in range(len(df2['children'][a]['children'])):
#         print(df2['children'][a]['children'][b]['name'])
        data.append(df2['children'][a]['children'][b]['name'])
        if len(df2['children'][a]['children'][b]) < 2:
            print(df2['children'][a]['children'][b]['name'])
        else:
            for c in range(len(df2['children'][a]['children'][b]['children'])):
#                 print(df2['children'][a]['children'][b]['children'][c]['name'])
                data.append(df2['children'][a]['children'][b]['children'][c]['name'])
                if len(df2['children'][a]['children'][b]['children'][c]) < 2:
                    print(df2['children'][a]['children'][b]['children'][c]['name'])
                else:
                    for d in range(len(df2['children'][a]['children'][b]['children'][c]['children'])):
#                         print(df2['children'][a]['children'][b]['children'][c]['children'][d]['name'])
                        data.append(df2['children'][a]['children'][b]['children'][c]['children'][d]['name'])

但我得到一个简单的列表,如下所示:

代码语言:javascript
复制
['01 Certain infectious or parasitic diseases',
 'Gastroenteritis or colitis of infectious origin',
 'Bacterial intestinal infections',
 '1A00  Cholera',
 '1A01  Intestinal infection due to other Vibrio',
 '1A02  Intestinal infections due to Shigella',
 '1A03  Intestinal infections due to Escherichia coli',
 '1A04  Enterocolitis due to Clostridium difficile',
 '1A05  Intestinal infections due to Yersinia enterocolitica',
 '1A06  Gastroenteritis due to Campylobacter',
 '1A07  Typhoid fever',
 '1A08  Paratyphoid Fever',
 '1A09  Infections due to other Salmonella',....
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-09-09 02:43:54

一个简单的仅限pandas的迭代方法。

代码语言:javascript
复制
res = requests.get("https://www.genome.jp/kegg-bin/download_htext?htext=br08403.keg&format=json&filedir=")
js = res.json()

df = pd.json_normalize(js)
for i in range(20):
    df = pd.json_normalize(df.explode("children").to_dict(orient="records"))
    if "children" in df.columns: df.drop(columns="children", inplace=True)
    df = df.rename(columns={"children.name":f"level{i}","children.children":"children"})
    if df[f"level{i}"].isna().all() or "children" not in df.columns: break
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/63746691

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档