首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python解析JSON并将输出写入CSV

Python解析JSON并将输出写入CSV
EN

Stack Overflow用户
提问于 2021-02-03 05:36:41
回答 3查看 116关注 0票数 0

我正在尝试解析一个包含以下内容的json文件。

代码语言:javascript
复制
{"short_desc":
{"3641":[{"when":1002747507,"what":"DCR: Cant compare from outliner (1GDHJKK)","who":14},{"when":1002771621,"what":"DCR: Can't compare from outliner (1GDHJKK)","who":21}],
"3470":[{"when":1002747341,"what":"Can't compare code editions in type hierarchy view (1GGNI4W)","who":24},{"when":1002771649,"what":"DCR: Can't compare code editions in type hierarchy view (1GGNI4W)","who":21}]

我尝试在每个数组前面获取具有头和每个数字的文本,并将其保存到csv中。预期结果如下。

代码语言:javascript
复制
id  | description
3641 | DCR: Cant compare from outliner (1GDHJKK)
3641 | DCR: Can't compare from outliner (1GDHJKK)
3470 | Can't compare code editions in type hierarchy view (1GGNI4W)
3470 | DCR: Can't compare code editions in type hierarchy view (1GGNI4W)

我尝试了下面的代码,只尝试获得什么的值,但是得到了一个错误KeyError: 'what'

代码语言:javascript
复制
import csv
import json
from glob import glob

# Open CSV output files for reading and writing
input_dir = ""
output_dir = ""

# Open main twitter data CSV file and write header row
output_file = output_dir + "coba.csv"
f_out = open(output_file, 'w', encoding='utf-8')
rowwriter = csv.writer(f_out, delimiter=',', lineterminator='\n')
outputrow = ['description']
rowwriter.writerow(outputrow)

# Define variables
inc = 0

with open('msr2013/data/v02/eclipse/short_desc.json', 'r', encoding='utf-8') as f:
    for line in f:
        bug = json.loads(line)

        # Set standard variables equal to tweet data
        bug = bug['short_desc']['what']          

        # Write to main output file
        outputrow = [bug] 
        rowwriter.writerow(outputrow)

        inc += 1
        # Optional counter increments variables to track progress, useful for very large files.
        if inc%10000 == 0:
            print(inc)

# Close the output file
f_out.close()

有人能给我一个解决办法吗?

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2021-02-03 06:56:59

您可以使用熊猫解析json,应用更改并将输出写入csv。

代码语言:javascript
复制
import pandas as pd
data = pd.read_json('test.json')
data['id'] = data.index
filter_data = data[data['short_desc'].notnull()].to_dict(orient='records')
inner_data = pd.json_normalize(filter_data, record_path='short_desc', meta=['id'],errors='ignore')
inner_data = inner_data[['what','id']]
inner_data = inner_data.rename(columns={'what':'description'})
inner_data.to_csv('test.csv')

产出:

票数 0
EN

Stack Overflow用户

发布于 2021-02-03 06:51:13

根据示例数据,您应该如下所示。

代码语言:javascript
复制
        bug = json.loads(line)

        # Set standard variables equal to tweet data
        for _id in bug['short_desc']:
            for i in range(len(bug['short_desc'][_id])):
                bug_what = bug['short_desc'][_id][i]['what']

                # Write to main output file
                outputrow = [bug_what]

bug['short_desc']是将id映射到数组的对象。

票数 0
EN

Stack Overflow用户

发布于 2021-02-03 07:20:35

根据您的数据示例,您在"short_description".中没有“”键

你应该这样做:

代码语言:javascript
复制
...
for line in f:
    bug = json.loads(line)
    for bug_id, bug_data in bug.get('short_desc', {}).items():
        #For first sample: bug_id='3470' and bug_data=[{...}, {...}] 
        for row in bug_data:
            rowwriter.writerow([bug_id, row['what']])
...  
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66021862

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档