首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >将.json文件中的许多Json字典转换为json记录列表

将.json文件中的许多Json字典转换为json记录列表
EN

Stack Overflow用户
提问于 2018-02-26 06:49:14
回答 1查看 43关注 0票数 0

我有一个有448条记录的.json文件。该文件的格式如下- 2记录作为示例数据

代码语言:javascript
复制
{
    "_id" : ObjectId("5a5faa4f8b91277fde0212b1"),
    "geo_accession" : [ 
        "GSE86910"
    ],
    "title" : [ 
        "RNA-seq transcriptonal profiling in human primary adult erythroid progenitor celression"
    ],
    "summary" : [ 
        "The developing erythroid cerythroid cells, and performed RNA-seq transcriptional profiling analysis."
    ],
    "num_samples" : 6,
    "overall_design" : [ 
        "Human primary adult erythroblasts were generated ex vivo from extracted for RNA-seq analysis."
    ],
    "samples" : {
        "GSM2310252" : {
            "title" : "RNAseq_A5-ProE-shNT-rep1",
            "treatment_protocol_ch1" : "NA",
            "source_name_ch1" : "Human primary adult proerythroblasts (ProEs)",
            "organism_ch1" : "Homo sapiens",
            "library_strategy" : "RNA-Seq"
        },
        "GSM2310253" : {
            "title" : "RNAseq_A5-ProE-shNT-rep2",
            "treatment_protocol_ch1" : "NA",
            "source_name_ch1" : "Human primary adult proerythroblasts (ProEs)",
            "organism_ch1" : "Homo sapiens",
            "library_strategy" : "RNA-Seq"
        },
        "GSM2310254" : {
            "title" : "RNAseq_A5-ProE-shTFAM-rep1",
            "treatment_protocol_ch1" : "NA",
            "source_name_ch1" : "Human primary adult proerythroblasts (ProEs)",
            "organism_ch1" : "Homo sapiens",
            "library_strategy" : "RNA-Seq"
        },
        "GSM2310255" : {
            "title" : "RNAseq_A5-ProE-shTFAM-rep2",
            "treatment_protocol_ch1" : "NA",
            "source_name_ch1" : "Human primary adult proerythroblasts (ProEs)",
            "organism_ch1" : "Homo sapiens",
            "library_strategy" : "RNA-Seq"
        },
        "GSM2310256" : {
            "title" : "RNAseq_A5-ProE-shPHB2-rep1",
            "treatment_protocol_ch1" : "NA",
            "source_name_ch1" : "Human primary adult proerythroblasts (ProEs)",
            "organism_ch1" : "Homo sapiens",
            "library_strategy" : "RNA-Seq"
        },
        "GSM2310257" : {
            "title" : "RNAseq_A5-ProE-shPHB2-rep2",
            "treatment_protocol_ch1" : "NA",
            "source_name_ch1" : "Human primary adult proerythroblasts (ProEs)",
            "organism_ch1" : "Homo sapiens",
            "library_strategy" : "RNA-Seq"
        }
    },
    "geo_signal" : {}
}


{
    "_id" : ObjectId("5a5faa4f8b91277fde0212b6"),
    "geo_accession" : [ 
        "GSE83592"
    ],
    "title" : [ 
        "JQ1 +/- Vemurafenib in BRAF mutant melanoma (A375)"
    ],
    "summary" : [ 
        "The apoptotic genes significantly down-regulated."
    ],
    "num_samples" : 2,
    "overall_design" : [ 
        "dsf"
    ],
    "samples" : {
        "GSM2210563" : {
            "title" : "16L",
            "source_name_ch1" : "A375 cell line",
            "organism_ch1" : "Homo sapiens",
            "library_strategy" : "RNA-Seq"
        },
        "GSM2210564" : {
            "title" : "16R",
            "source_name_ch1" : "A375 cell line",
            "organism_ch1" : "Homo sapiens",
            "library_strategy" : "RNA-Seq"
        },
            },
    "geo_signal" : {}
}

现在我完全同意这种格式,但是很明显,json.load对这种格式不起作用,并给出了tihs错误

代码语言:javascript
复制
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1

这是一种将所有这些转换成json格式的记录列表的方法吗

代码语言:javascript
复制
    [
    {
    "_id" : ObjectId("5a5faa4f8b91277fde0212b1"),
    "geo_accession" : [ 
        "GSE86910"
    ],
    "title" : [ 
        "RNA-seq transcriptonal profiling in human primary adult erythroid progenitor celression"
    ],
    "summary" : [ 
        "The developing erythroid cerythroid cells, and performed RNA-seq transcriptional profiling analysis."
    ],
    "num_samples" : 6,
    "overall_design" : [ 
        "Human primary adult erythroblasts were generated ex vivo from extracted for RNA-seq analysis."
    ],
    "samples" : {
        "GSM2310252" : {
            "title" : "RNAseq_A5-ProE-shNT-rep1",
            "treatment_protocol_ch1" : "NA",
            "source_name_ch1" : "Human primary adult proerythroblasts (ProEs)",
            "organism_ch1" : "Homo sapiens",
            "library_strategy" : "RNA-Seq"
        },
        "GSM2310253" : {
            "title" : "RNAseq_A5-ProE-shNT-rep2",
            "treatment_protocol_ch1" : "NA",
            "source_name_ch1" : "Human primary adult proerythroblasts (ProEs)",
            "organism_ch1" : "Homo sapiens",
            "library_strategy" : "RNA-Seq"
        },
        "GSM2310254" : {
            "title" : "RNAseq_A5-ProE-shTFAM-rep1",
            "treatment_protocol_ch1" : "NA",
            "source_name_ch1" : "Human primary adult proerythroblasts (ProEs)",
            "organism_ch1" : "Homo sapiens",
            "library_strategy" : "RNA-Seq"
        },
        "GSM2310255" : {
            "title" : "RNAseq_A5-ProE-shTFAM-rep2",
            "treatment_protocol_ch1" : "NA",
            "source_name_ch1" : "Human primary adult proerythroblasts (ProEs)",
            "organism_ch1" : "Homo sapiens",
            "library_strategy" : "RNA-Seq"
        },
        "GSM2310256" : {
            "title" : "RNAseq_A5-ProE-shPHB2-rep1",
            "treatment_protocol_ch1" : "NA",
            "source_name_ch1" : "Human primary adult proerythroblasts (ProEs)",
            "organism_ch1" : "Homo sapiens",
            "library_strategy" : "RNA-Seq"
        },
        "GSM2310257" : {
            "title" : "RNAseq_A5-ProE-shPHB2-rep2",
            "treatment_protocol_ch1" : "NA",
            "source_name_ch1" : "Human primary adult proerythroblasts (ProEs)",
            "organism_ch1" : "Homo sapiens",
            "library_strategy" : "RNA-Seq"
        }
    },
    "geo_signal" : {}
}


{
    "_id" : ObjectId("5a5faa4f8b91277fde0212b6"),
    "geo_accession" : [ 
        "GSE83592"
    ],
    "title" : [ 
        "JQ1 +/- Vemurafenib in BRAF mutant melanoma (A375)"
    ],
    "summary" : [ 
        "The apoptotic genes significantly down-regulated."
    ],
    "num_samples" : 2,
    "overall_design" : [ 
        "dsf"
    ],
    "samples" : {
        "GSM2210563" : {
            "title" : "16L",
            "source_name_ch1" : "A375 cell line",
            "organism_ch1" : "Homo sapiens",
            "library_strategy" : "RNA-Seq"
        },
        "GSM2210564" : {
            "title" : "16R",
            "source_name_ch1" : "A375 cell line",
            "organism_ch1" : "Homo sapiens",
            "library_strategy" : "RNA-Seq"
        },
            },
    "geo_signal" : {}
}
    ]

最好是用蟒蛇。谢谢。

EN

回答 1

Stack Overflow用户

发布于 2018-02-26 08:38:22

使用正则表达式处理作为有效的JSON。

代码语言:javascript
复制
import re
import json
def replaceFun(match):
    return match.group(1)
f=open("file/test.json","r")
str=f.read()
pattern=re.compile(r",\s*}",re.M)
newstr=pattern.sub(r"\n}",str)
pattern=re.compile(r"ObjectId\((.+)\)",re.M)
newstr=pattern.sub(replaceFun,newstr)
f.close()
newdict=json.loads(newstr)
for v in newdict:
    print(v)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/48982869

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档