文章/答案/技术大牛

发布

社区首页 >问答首页 >如何有效地修复从熊猫数据转换而来的JSON文件

问如何有效地修复从熊猫数据转换而来的JSON文件
EN

Stack Overflow用户

提问于 2022-01-21 21:51:46

回答 1查看 206关注 0票数 0

我有一个JSON文件，我读熊猫和转换成一个数据。然后，我将这个文件导出为CSV，这样我就可以更容易地编辑它。完成后，我将CSV文件读取回dataframe，然后希望将它转换回JSON文件。但是，在这个过程中，大量额外的数据自动添加到我原来的字典列表( JSON文件)中。

我确信我可以破解一个修补程序，但是想知道是否有人知道如何有效地处理这个过程，这样就不会在我的原始JSON数据中添加新的数据或列了？

原始JSON (片段)：

  [
    {
        "tag": "!= (not-equal-to operator)",
        "definition": "",
        "source": [
            {
                "title": "Compare Dictionaries",
                "URL": "https://learning.oreilly.com/library/view/introducing-python-2nd/9781492051374/ch08.html#idm45795007002280"
            }
        ]
    },
    {
        "tag": "\"intelligent\" applications",
        "definition": "",
        "source": [
            {
                "title": "Why Machine Learning?",
                "URL": "https://learning.oreilly.com/library/view/introduction-to-machine/9781449369880/https://learning.oreilly.com/library/view/introduction-to-machine/9781449369880/ch01.html#idm45613685872600"
            }
        ]
    },
    {
        "tag": "# (pound sign)",
        "definition": "",
        "source": [
            {
                "title": "Comment with #",
                "URL": "https://learning.oreilly.com/library/view/introducing-python-2nd/9781492051374/ch04.html#idm45795038172984"
            }
        ]
    },

CSV作为数据文件(自动添加索引)：

    tag definition  source
0   != (not-equal-to operator)      [{'title': 'Compare Dictionaries', 'URL': 'htt...
1   "intelligent" applications      [{'title': 'Why Machine Learning?', 'URL': 'ht...
2   # (pound sign)      [{'title': 'Comment with #', 'URL': 'https://l...
3   $ (Mac/Linux prompt)        [{'title': 'Test Driving Python', 'URL': 'http...
4   $ (anchor)      [{'title': 'Patterns: Using Specifiers', 'URL'...
... ... ... ...
11375   { } (curly brackets)        []
11376   | (vertical bar)        [{'title': 'Combinations and Operators', 'URL'...
11377   || (concatenation) function (DB2/Oracle/Postgr...       [{'title': 'Discussion', 'URL': 'https://learn...
11378   || (for Oracle Database)        [{'title': 'Including special characters', 'UR...
11379   || (vertical bar, double), concatenation opera...       [{'title': 'Including special characters', 'UR...
7009 rows × 3 columns

从CSV转换后的JSON文件(各种糟糕的)：

{
  "0":{
    "Unnamed: 0":0,
    "tag":"!= (not-equal-to operator)",
    "definition":null,
    "source":"[{'title': 'Compare Dictionaries', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introducing-python-2nd\/9781492051374\/ch08.html#idm45795007002280'}]"
  },
  "1":{
    "Unnamed: 0":1,
    "tag":"\"intelligent\" applications",
    "definition":null,
    "source":"[{'title': 'Why Machine Learning?', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introduction-to-machine\/9781449369880\/https:\/\/learning.oreilly.com\/library\/view\/introduction-to-machine\/9781449369880\/ch01.html#idm45613685872600'}]"
  },
  "2":{
    "Unnamed: 0":2,
    "tag":"# (pound sign)",
    "definition":null,
    "source":"[{'title': 'Comment with #', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introducing-python-2nd\/9781492051374\/ch04.html#idm45795038172984'}]"
  },

这里是我的代码：

import pandas as pd
import json

# to dataframe
tags_df = pd.read_json('dsa_tags_flat.json')

# csv file was manually cleaned then reloaded here
cleaned_csv_df = pd.read_csv('dsa-parser-flat.csv')

# write to JSON
cleaned_csv_df.to_json(r'dsa-tags.json', orient='index', indent=2)

编辑:在从dataframe到CSV的过程中，我向代码中添加了一个index=false，这很有帮助，但是仍然有原来的JSON中没有的键索引。我想知道某个地方的图书馆能不能阻止这一切？还是我只需要自己写一些循环并删除它们？

另外，正如您所看到的，URL正斜杠被转义了。不是我想要的。

{
    "0":{
        "tag":"!= (not-equal-to operator)",
        "definition":null,
        "source":"[{'title': 'Compare Dictionaries', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introducing-python-2nd\/9781492051374\/ch08.html#idm45795007002280'}]"
    },
    "1":{
        "tag":"\"intelligent\" applications",
        "definition":null,
        "source":"[{'title': 'Why Machine Learning?', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introduction-to-machine\/9781449369880\/https:\/\/learning.oreilly.com\/library\/view\/introduction-to-machine\/9781449369880\/ch01.html#idm45613685872600'}]"
    },
    "2":{
        "tag":"# (pound sign)",
        "definition":null,
        "source":"[{'title': 'Comment with #', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introducing-python-2nd\/9781492051374\/ch04.html#idm45795038172984'}]"
    },
    "3":{
        "tag":"$ (Mac\/Linux prompt)",
        "definition":null,
        "source":"[{'title': 'Test Driving Python', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/data-wrangling-with\/9781491948804\/ch01.html#idm140080973230480'}]"
    },

pandas

python

json

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-01-21 22:07:18

问题是，您要在两个位置添加索引。

一次当你把文件写到csv的时候。这将在最终的JSON文件中添加"Unnamed: 0“字段。您可以在将CSV写入磁盘时在to_csv方法中使用to_csv，或者在读取read_csv中保存的CSV时指定index_col参数。

其次，在使用orient="index"将df写入json时添加了一个索引。这将在最终的JSON文件中添加最外层的索引，如"0“、"1”。如果您打算以与加载json类似的格式保存json，则应该使用orient="records"。

要了解orient参数是如何工作的，请参阅json

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/70807989

复制

相似问题

问如何有效地修复从熊猫数据转换而来的JSON文件
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何有效地修复从熊猫数据转换而来的JSON文件EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何有效地修复从熊猫数据转换而来的JSON文件
EN