首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何有效地修复从熊猫数据转换而来的JSON文件

如何有效地修复从熊猫数据转换而来的JSON文件
EN

Stack Overflow用户
提问于 2022-01-21 21:51:46
回答 1查看 206关注 0票数 0

我有一个JSON文件,我读熊猫和转换成一个数据。然后,我将这个文件导出为CSV,这样我就可以更容易地编辑它。完成后,我将CSV文件读取回dataframe,然后希望将它转换回JSON文件。但是,在这个过程中,大量额外的数据自动添加到我原来的字典列表( JSON文件)中。

我确信我可以破解一个修补程序,但是想知道是否有人知道如何有效地处理这个过程,这样就不会在我的原始JSON数据中添加新的数据或列了?

原始JSON (片段):

代码语言:javascript
复制
  [
    {
        "tag": "!= (not-equal-to operator)",
        "definition": "",
        "source": [
            {
                "title": "Compare Dictionaries",
                "URL": "https://learning.oreilly.com/library/view/introducing-python-2nd/9781492051374/ch08.html#idm45795007002280"
            }
        ]
    },
    {
        "tag": "\"intelligent\" applications",
        "definition": "",
        "source": [
            {
                "title": "Why Machine Learning?",
                "URL": "https://learning.oreilly.com/library/view/introduction-to-machine/9781449369880/https://learning.oreilly.com/library/view/introduction-to-machine/9781449369880/ch01.html#idm45613685872600"
            }
        ]
    },
    {
        "tag": "# (pound sign)",
        "definition": "",
        "source": [
            {
                "title": "Comment with #",
                "URL": "https://learning.oreilly.com/library/view/introducing-python-2nd/9781492051374/ch04.html#idm45795038172984"
            }
        ]
    },

CSV作为数据文件(自动添加索引):

代码语言:javascript
复制
    tag definition  source
0   != (not-equal-to operator)      [{'title': 'Compare Dictionaries', 'URL': 'htt...
1   "intelligent" applications      [{'title': 'Why Machine Learning?', 'URL': 'ht...
2   # (pound sign)      [{'title': 'Comment with #', 'URL': 'https://l...
3   $ (Mac/Linux prompt)        [{'title': 'Test Driving Python', 'URL': 'http...
4   $ (anchor)      [{'title': 'Patterns: Using Specifiers', 'URL'...
... ... ... ...
11375   { } (curly brackets)        []
11376   | (vertical bar)        [{'title': 'Combinations and Operators', 'URL'...
11377   || (concatenation) function (DB2/Oracle/Postgr...       [{'title': 'Discussion', 'URL': 'https://learn...
11378   || (for Oracle Database)        [{'title': 'Including special characters', 'UR...
11379   || (vertical bar, double), concatenation opera...       [{'title': 'Including special characters', 'UR...
7009 rows × 3 columns

从CSV转换后的JSON文件(各种糟糕的):

代码语言:javascript
复制
{
  "0":{
    "Unnamed: 0":0,
    "tag":"!= (not-equal-to operator)",
    "definition":null,
    "source":"[{'title': 'Compare Dictionaries', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introducing-python-2nd\/9781492051374\/ch08.html#idm45795007002280'}]"
  },
  "1":{
    "Unnamed: 0":1,
    "tag":"\"intelligent\" applications",
    "definition":null,
    "source":"[{'title': 'Why Machine Learning?', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introduction-to-machine\/9781449369880\/https:\/\/learning.oreilly.com\/library\/view\/introduction-to-machine\/9781449369880\/ch01.html#idm45613685872600'}]"
  },
  "2":{
    "Unnamed: 0":2,
    "tag":"# (pound sign)",
    "definition":null,
    "source":"[{'title': 'Comment with #', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introducing-python-2nd\/9781492051374\/ch04.html#idm45795038172984'}]"
  },

这里是我的代码:

代码语言:javascript
复制
import pandas as pd
import json

# to dataframe
tags_df = pd.read_json('dsa_tags_flat.json')

# csv file was manually cleaned then reloaded here
cleaned_csv_df = pd.read_csv('dsa-parser-flat.csv')

# write to JSON
cleaned_csv_df.to_json(r'dsa-tags.json', orient='index', indent=2)

编辑:在从dataframe到CSV的过程中,我向代码中添加了一个index=false,这很有帮助,但是仍然有原来的JSON中没有的键索引。我想知道某个地方的图书馆能不能阻止这一切?还是我只需要自己写一些循环并删除它们?

另外,正如您所看到的,URL正斜杠被转义了。不是我想要的。

代码语言:javascript
复制
{
    "0":{
        "tag":"!= (not-equal-to operator)",
        "definition":null,
        "source":"[{'title': 'Compare Dictionaries', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introducing-python-2nd\/9781492051374\/ch08.html#idm45795007002280'}]"
    },
    "1":{
        "tag":"\"intelligent\" applications",
        "definition":null,
        "source":"[{'title': 'Why Machine Learning?', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introduction-to-machine\/9781449369880\/https:\/\/learning.oreilly.com\/library\/view\/introduction-to-machine\/9781449369880\/ch01.html#idm45613685872600'}]"
    },
    "2":{
        "tag":"# (pound sign)",
        "definition":null,
        "source":"[{'title': 'Comment with #', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introducing-python-2nd\/9781492051374\/ch04.html#idm45795038172984'}]"
    },
    "3":{
        "tag":"$ (Mac\/Linux prompt)",
        "definition":null,
        "source":"[{'title': 'Test Driving Python', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/data-wrangling-with\/9781491948804\/ch01.html#idm140080973230480'}]"
    },
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-01-21 22:07:18

问题是,您要在两个位置添加索引。

一次当你把文件写到csv的时候。这将在最终的JSON文件中添加"Unnamed: 0“字段。您可以在将CSV写入磁盘时在to_csv方法中使用to_csv,或者在读取read_csv中保存的CSV时指定index_col参数。

其次,在使用orient="index"将df写入json时添加了一个索引。这将在最终的JSON文件中添加最外层的索引,如"0“、"1”。如果您打算以与加载json类似的格式保存json,则应该使用orient="records"

要了解orient参数是如何工作的,请参阅json

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/70807989

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档