我有一个JSON文件,我读熊猫和转换成一个数据。然后,我将这个文件导出为CSV,这样我就可以更容易地编辑它。完成后,我将CSV文件读取回dataframe,然后希望将它转换回JSON文件。但是,在这个过程中,大量额外的数据自动添加到我原来的字典列表( JSON文件)中。
我确信我可以破解一个修补程序,但是想知道是否有人知道如何有效地处理这个过程,这样就不会在我的原始JSON数据中添加新的数据或列了?
原始JSON (片段):
[
{
"tag": "!= (not-equal-to operator)",
"definition": "",
"source": [
{
"title": "Compare Dictionaries",
"URL": "https://learning.oreilly.com/library/view/introducing-python-2nd/9781492051374/ch08.html#idm45795007002280"
}
]
},
{
"tag": "\"intelligent\" applications",
"definition": "",
"source": [
{
"title": "Why Machine Learning?",
"URL": "https://learning.oreilly.com/library/view/introduction-to-machine/9781449369880/https://learning.oreilly.com/library/view/introduction-to-machine/9781449369880/ch01.html#idm45613685872600"
}
]
},
{
"tag": "# (pound sign)",
"definition": "",
"source": [
{
"title": "Comment with #",
"URL": "https://learning.oreilly.com/library/view/introducing-python-2nd/9781492051374/ch04.html#idm45795038172984"
}
]
},CSV作为数据文件(自动添加索引):
tag definition source
0 != (not-equal-to operator) [{'title': 'Compare Dictionaries', 'URL': 'htt...
1 "intelligent" applications [{'title': 'Why Machine Learning?', 'URL': 'ht...
2 # (pound sign) [{'title': 'Comment with #', 'URL': 'https://l...
3 $ (Mac/Linux prompt) [{'title': 'Test Driving Python', 'URL': 'http...
4 $ (anchor) [{'title': 'Patterns: Using Specifiers', 'URL'...
... ... ... ...
11375 { } (curly brackets) []
11376 | (vertical bar) [{'title': 'Combinations and Operators', 'URL'...
11377 || (concatenation) function (DB2/Oracle/Postgr... [{'title': 'Discussion', 'URL': 'https://learn...
11378 || (for Oracle Database) [{'title': 'Including special characters', 'UR...
11379 || (vertical bar, double), concatenation opera... [{'title': 'Including special characters', 'UR...
7009 rows × 3 columns从CSV转换后的JSON文件(各种糟糕的):
{
"0":{
"Unnamed: 0":0,
"tag":"!= (not-equal-to operator)",
"definition":null,
"source":"[{'title': 'Compare Dictionaries', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introducing-python-2nd\/9781492051374\/ch08.html#idm45795007002280'}]"
},
"1":{
"Unnamed: 0":1,
"tag":"\"intelligent\" applications",
"definition":null,
"source":"[{'title': 'Why Machine Learning?', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introduction-to-machine\/9781449369880\/https:\/\/learning.oreilly.com\/library\/view\/introduction-to-machine\/9781449369880\/ch01.html#idm45613685872600'}]"
},
"2":{
"Unnamed: 0":2,
"tag":"# (pound sign)",
"definition":null,
"source":"[{'title': 'Comment with #', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introducing-python-2nd\/9781492051374\/ch04.html#idm45795038172984'}]"
},这里是我的代码:
import pandas as pd
import json
# to dataframe
tags_df = pd.read_json('dsa_tags_flat.json')
# csv file was manually cleaned then reloaded here
cleaned_csv_df = pd.read_csv('dsa-parser-flat.csv')
# write to JSON
cleaned_csv_df.to_json(r'dsa-tags.json', orient='index', indent=2)编辑:在从dataframe到CSV的过程中,我向代码中添加了一个index=false,这很有帮助,但是仍然有原来的JSON中没有的键索引。我想知道某个地方的图书馆能不能阻止这一切?还是我只需要自己写一些循环并删除它们?
另外,正如您所看到的,URL正斜杠被转义了。不是我想要的。
{
"0":{
"tag":"!= (not-equal-to operator)",
"definition":null,
"source":"[{'title': 'Compare Dictionaries', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introducing-python-2nd\/9781492051374\/ch08.html#idm45795007002280'}]"
},
"1":{
"tag":"\"intelligent\" applications",
"definition":null,
"source":"[{'title': 'Why Machine Learning?', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introduction-to-machine\/9781449369880\/https:\/\/learning.oreilly.com\/library\/view\/introduction-to-machine\/9781449369880\/ch01.html#idm45613685872600'}]"
},
"2":{
"tag":"# (pound sign)",
"definition":null,
"source":"[{'title': 'Comment with #', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introducing-python-2nd\/9781492051374\/ch04.html#idm45795038172984'}]"
},
"3":{
"tag":"$ (Mac\/Linux prompt)",
"definition":null,
"source":"[{'title': 'Test Driving Python', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/data-wrangling-with\/9781491948804\/ch01.html#idm140080973230480'}]"
},发布于 2022-01-21 22:07:18
问题是,您要在两个位置添加索引。
一次当你把文件写到csv的时候。这将在最终的JSON文件中添加"Unnamed: 0“字段。您可以在将CSV写入磁盘时在to_csv方法中使用to_csv,或者在读取read_csv中保存的CSV时指定index_col参数。
其次,在使用orient="index"将df写入json时添加了一个索引。这将在最终的JSON文件中添加最外层的索引,如"0“、"1”。如果您打算以与加载json类似的格式保存json,则应该使用orient="records"。
要了解orient参数是如何工作的,请参阅json
https://stackoverflow.com/questions/70807989
复制相似问题