我正在尝试解析一个包含以下内容的json文件。
{"short_desc":
{"3641":[{"when":1002747507,"what":"DCR: Cant compare from outliner (1GDHJKK)","who":14},{"when":1002771621,"what":"DCR: Can't compare from outliner (1GDHJKK)","who":21}],
"3470":[{"when":1002747341,"what":"Can't compare code editions in type hierarchy view (1GGNI4W)","who":24},{"when":1002771649,"what":"DCR: Can't compare code editions in type hierarchy view (1GGNI4W)","who":21}]我尝试在每个数组前面获取具有,头和每个数字的文本,并将其保存到csv中。预期结果如下。
id | description
3641 | DCR: Cant compare from outliner (1GDHJKK)
3641 | DCR: Can't compare from outliner (1GDHJKK)
3470 | Can't compare code editions in type hierarchy view (1GGNI4W)
3470 | DCR: Can't compare code editions in type hierarchy view (1GGNI4W)我尝试了下面的代码,只尝试获得什么的值,但是得到了一个错误KeyError: 'what'
import csv
import json
from glob import glob
# Open CSV output files for reading and writing
input_dir = ""
output_dir = ""
# Open main twitter data CSV file and write header row
output_file = output_dir + "coba.csv"
f_out = open(output_file, 'w', encoding='utf-8')
rowwriter = csv.writer(f_out, delimiter=',', lineterminator='\n')
outputrow = ['description']
rowwriter.writerow(outputrow)
# Define variables
inc = 0
with open('msr2013/data/v02/eclipse/short_desc.json', 'r', encoding='utf-8') as f:
for line in f:
bug = json.loads(line)
# Set standard variables equal to tweet data
bug = bug['short_desc']['what']
# Write to main output file
outputrow = [bug]
rowwriter.writerow(outputrow)
inc += 1
# Optional counter increments variables to track progress, useful for very large files.
if inc%10000 == 0:
print(inc)
# Close the output file
f_out.close()有人能给我一个解决办法吗?
发布于 2021-02-03 06:56:59
您可以使用熊猫解析json,应用更改并将输出写入csv。
import pandas as pd
data = pd.read_json('test.json')
data['id'] = data.index
filter_data = data[data['short_desc'].notnull()].to_dict(orient='records')
inner_data = pd.json_normalize(filter_data, record_path='short_desc', meta=['id'],errors='ignore')
inner_data = inner_data[['what','id']]
inner_data = inner_data.rename(columns={'what':'description'})
inner_data.to_csv('test.csv')产出:

发布于 2021-02-03 06:51:13
根据示例数据,您应该如下所示。
bug = json.loads(line)
# Set standard variables equal to tweet data
for _id in bug['short_desc']:
for i in range(len(bug['short_desc'][_id])):
bug_what = bug['short_desc'][_id][i]['what']
# Write to main output file
outputrow = [bug_what]bug['short_desc']是将id映射到数组的对象。
发布于 2021-02-03 07:20:35
根据您的数据示例,您在"short_description".中没有“”键

你应该这样做:
...
for line in f:
bug = json.loads(line)
for bug_id, bug_data in bug.get('short_desc', {}).items():
#For first sample: bug_id='3470' and bug_data=[{...}, {...}]
for row in bug_data:
rowwriter.writerow([bug_id, row['what']])
... https://stackoverflow.com/questions/66021862
复制相似问题