首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >整理嵌套字典并将其转换为Dataframe的列

整理嵌套字典并将其转换为Dataframe的列
EN

Stack Overflow用户
提问于 2020-08-19 07:58:12
回答 1查看 4.3K关注 0票数 4

我有一个JSON,我把它转换成了一个字典,并试图用它制作一个数据格式。问题在于它是多个嵌套的,而且数据不一致。

例如:

代码语言:javascript
复制
d = """[
      {
        "id": 51,
        "kits": [
            {
                "id": 57,
                "kit": "KIT1182A",
                "items": [
                    {
                        "id": 254,
                        "product": {
                            "name": "Plastic Pallet",
                            "short_code": "PP001",
                            "priceperunit": 2500,
                            "volumetric_weight": 21.34
                        },
                        "quantity": 5
                    },
                    {
                        "id": 258,
                        "product": {
                            "name": "Separator Sheet",
                            "short_code": "FSS001",
                            "priceperunit": 170,
                            "volumetric_weight": 0.9
                        },
                        "quantity": 18
                    }
                ],
                "quantity": 5
            },                                     #end of kit
            {
                "id": 58,
                "kit": "KIT1182B",
                "items": [
                    {
                        "id": 259,
                        "product": {
                            "name": "Plastic Pallet",
                            "short_code": "PP001",
                            "priceperunit": 2500,
                            "volumetric_weight": 21.34
                        },
                        "quantity": 5
                    },
                    {
                        "id": 260,
                        "product": {
                            "name": "Plastic Sidewall",
                            "short_code": "PS001",
                            "priceperunit": 1250,
                            "volumetric_weight": 16.1
                        },
                        "quantity": 5
                    },
                    {
                        "id": 261,
                        "product": {
                            "name": "Plastic Lid",
                            "short_code": "PL001",
                            "priceperunit": 1250,
                            "volumetric_weight": 9.7
                        },
                        "quantity": 5
                    }
                   
                ],
                "quantity": 7
            }                                    #end of kit
        ],
        "warehouse": "Yantraksh Logistics Private limited_GGNPC1",
        "receiver_client": "Lumax Cornaglia Auto Tech Private Limited",
        "transport_by": "Kiran Roadways",
        "transaction_type": "Return",
        "transaction_date": "2020-08-13T04:34:11.678000Z",
        "transaction_no": 1180,
        "is_delivered": false,
        "driver_name": "__________",
        "driver_number": "__________",
        "lr_number": 0,
        "vehicle_number": "__________",
        "freight_charges": 0,
        "vehicle_type": "Part Load",
        "remarks": "0",
        "flow": 36,
        "owner": 2
    } ]"""

我想把它转换成一个数据文件,如下所示:

代码语言:javascript
复制
transaction_no  is_delivered    flow    transaction_date    receiver_client warehouse   kits    quantity    product1    quantity1   product2    quantity2   product3    quantity3
1180       False    36  2020-08-13T04:34:11.678000Z Lumax Cornaglia Auto Tech Private Limited   Yantraksh Logistics Private limited_GGNPC1  KIT1182A    5   PP001   5   FSS001  18  NaN NaN
1180       False    36  2020-08-13T04:34:11.678000Z Lumax Cornaglia Auto Tech Private Limited   Yantraksh Logistics Private limited_GGNPC1  KIT1182B    7   PP001   5   PS001   5   PL001   7.0

或者以一种更好的方式表现出来:

我所做的:

代码语言:javascript
复制
data = json.loads(d)
result_dataframe = pd.DataFrame(data)
l = ['transaction_no', 'is_delivered','flow', 'transaction_date', 'receiver_client', 'warehouse','kits']  #fields that I need
result_dataframe = result_dataframe[l]
result_dataframe.to_csv("out.csv")

我试过:

代码语言:javascript
复制
def flatten(input_dict, separator='_', prefix=''):
output_dict = {}
for key, value in input_dict.items():
    if isinstance(value, dict) and value:
        deeper = flatten(value, separator, prefix+key+separator)
        output_dict.update({key2: val2 for key2, val2 in deeper.items()})
    elif isinstance(value, list) and value:
        for index, sublist in enumerate(value, start=1):
            if isinstance(sublist, dict) and sublist:
                deeper = flatten(sublist, separator, prefix+key+separator+str(index)+separator)
                output_dict.update({key2: val2 for key2, val2 in deeper.items()})
            else:
                output_dict[prefix+key+separator+str(index)] = value
    else:
        output_dict[prefix+key] = value
return output_dict

但是它给出了单个行中的所有值,我如何在工具包的基础上分离它们并得到结果?

EN

回答 1

Stack Overflow用户

发布于 2020-08-19 09:42:20

熊猫规格化功能应该是您正在寻找的。

下面是如何规范您的Json输入:

代码语言:javascript
复制
pd.json_normalize(data, ['kits', 'items'], 
                  [['kits', 'kit'], 'transaction_no', 'is_delivered','flow', 'transaction_date', 'receiver_client', 'warehouse'], 
                  errors='ignore', record_prefix='kits.')
  • 第一个参数是您的数据集。
  • 第二个参数是记录路径,您希望输入的嵌套数据的级别。
  • 第三个参数是要添加到输入中的元数据的路径。

根据产品被列分割,您应该尝试做一个枢轴表。

祝好运。

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/63482476

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档