文章/答案/技术大牛

发布

社区首页 >问答首页 >扁平嵌套字典并将其转换为列

问扁平嵌套字典并将其转换为列
EN

Stack Overflow用户

提问于 2020-08-13 06:27:13

回答 2查看 231关注 0票数 0

我有一个JSON，我把它转换成了一个字典，并试图用它制作一个数据格式。问题在于它是多个嵌套的，而且数据不一致。

例如：

d = """[
      {
        "id": 51,
        "kits": [
            {
                "id": 57,
                "kit": "KIT1182A",
                "items": [
                    {
                        "id": 254,
                        "product": {
                            "name": "Plastic Pallet",
                            "short_code": "PP001",
                            "priceperunit": 2500,
                            "volumetric_weight": 21.34
                        },
                        "quantity": 5
                    },
                    {
                        "id": 258,
                        "product": {
                            "name": "Separator Sheet",
                            "short_code": "FSS001",
                            "priceperunit": 170,
                            "volumetric_weight": 0.9
                        },
                        "quantity": 18
                    }
                ],
                "quantity": 5
            },                                     #end of kit
            {
                "id": 58,
                "kit": "KIT1182B",
                "items": [
                    {
                        "id": 259,
                        "product": {
                            "name": "Plastic Pallet",
                            "short_code": "PP001",
                            "priceperunit": 2500,
                            "volumetric_weight": 21.34
                        },
                        "quantity": 5
                    },
                    {
                        "id": 260,
                        "product": {
                            "name": "Plastic Sidewall",
                            "short_code": "PS001",
                            "priceperunit": 1250,
                            "volumetric_weight": 16.1
                        },
                        "quantity": 5
                    },
                    {
                        "id": 261,
                        "product": {
                            "name": "Plastic Lid",
                            "short_code": "PL001",
                            "priceperunit": 1250,
                            "volumetric_weight": 9.7
                        },
                        "quantity": 5
                    }
                   
                ],
                "quantity": 7
            }                                    #end of kit
        ],
        "warehouse": "Yantraksh Logistics Private limited_GGNPC1",
        "receiver_client": "Lumax Cornaglia Auto Tech Private Limited",
        "transport_by": "Kiran Roadways",
        "transaction_type": "Return",
        "transaction_date": "2020-08-13T04:34:11.678000Z",
        "transaction_no": 1180,
        "is_delivered": false,
        "driver_name": "__________",
        "driver_number": "__________",
        "lr_number": 0,
        "vehicle_number": "__________",
        "freight_charges": 0,
        "vehicle_type": "Part Load",
        "remarks": "0",
        "flow": 36,
        "owner": 2
    } ]"""

我想把它转换成一个数据文件，如下所示：

transaction_no  is_delivered    flow    transaction_date    receiver_client warehouse   kits    quantity    product1    quantity1   product2    quantity2   product3    quantity3
1180       False    36  2020-08-13T04:34:11.678000Z Lumax Cornaglia Auto Tech Private Limited   Yantraksh Logistics Private limited_GGNPC1  KIT1182A    5   PP001   5   FSS001  18  NaN NaN
1180       False    36  2020-08-13T04:34:11.678000Z Lumax Cornaglia Auto Tech Private Limited   Yantraksh Logistics Private limited_GGNPC1  KIT1182B    7   PP001   5   PS001   5   PL001   7.0

或者以一种更好的方式表现出来：

我所做的：

data = json.loads(d)
result_dataframe = pd.DataFrame(data)
l = ['transaction_no', 'is_delivered','flow', 'transaction_date', 'receiver_client', 'warehouse','kits']  #fields that I need
result_dataframe = result_dataframe[l]
result_dataframe.to_csv("out.csv")

我试过：

def flatten(input_dict, separator='_', prefix=''):
output_dict = {}
for key, value in input_dict.items():
    if isinstance(value, dict) and value:
        deeper = flatten(value, separator, prefix+key+separator)
        output_dict.update({key2: val2 for key2, val2 in deeper.items()})
    elif isinstance(value, list) and value:
        for index, sublist in enumerate(value, start=1):
            if isinstance(sublist, dict) and sublist:
                deeper = flatten(sublist, separator, prefix+key+separator+str(index)+separator)
                output_dict.update({key2: val2 for key2, val2 in deeper.items()})
            else:
                output_dict[prefix+key+separator+str(index)] = value
    else:
        output_dict[prefix+key] = value
return output_dict

但是它给出了单个行中的所有值，我如何在工具包的基础上分离它们并得到结果？

python

json

pandas

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-08-20 17:05:02

像上面这样的数据转换非常常见。熊猫提供了很多工具来帮助你完成这项任务。

data = json.loads(d)
df = pd.json_normalize(data, record_path=['kits'], meta= ['transaction_no', 'is_delivered','flow', 'transaction_date', 'receiver_client', 'warehouse'])  # line 1
df = df.explode('items')  # line 2
df[['product_code', 'product_quantity']] = df['items'].apply(lambda x: pd.Series([x['product']['short_code'], x['quantity']]))  # line 3
df.drop(columns=['items'])  # line 4

会为你提供

  id       kit  quantity transaction_no is_delivered flow             transaction_date                            receiver_client                                   warehouse product_code  product_quantity
0  57  KIT1182A  5         1180           False        36   2020-08-13T04:34:11.678000Z  Lumax Cornaglia Auto Tech Private Limited  Yantraksh Logistics Private limited_GGNPC1  PP001        5               
0  57  KIT1182A  5         1180           False        36   2020-08-13T04:34:11.678000Z  Lumax Cornaglia Auto Tech Private Limited  Yantraksh Logistics Private limited_GGNPC1  FSS001       18              
1  58  KIT1182B  7         1180           False        36   2020-08-13T04:34:11.678000Z  Lumax Cornaglia Auto Tech Private Limited  Yantraksh Logistics Private limited_GGNPC1  PP001        5               
1  58  KIT1182B  7         1180           False        36   2020-08-13T04:34:11.678000Z  Lumax Cornaglia Auto Tech Private Limited  Yantraksh Logistics Private limited_GGNPC1  PS001        5               
1  58  KIT1182B  7         1180           False        36   2020-08-13T04:34:11.678000Z  Lumax Cornaglia Auto Tech Private Limited  Yantraksh Logistics Private limited_GGNPC1  PL001        5

实际上，诀窍只存在于pd.json_normalize (第1行)中。它将创建一个与您所要求的数据非常相似的数据：

   id       kit                                                                                                                                                                                                                                                                                                                                                                                                                             items  quantity transaction_no is_delivered flow             transaction_date                            receiver_client                                   warehouse
0  57  KIT1182A  [{'id': 254, 'product': {'name': 'Plastic Pallet', 'short_code': 'PP001', 'priceperunit': 2500, 'volumetric_weight': 21.34}, 'quantity': 5}, {'id': 258, 'product': {'name': 'Separator Sheet', 'short_code': 'FSS001', 'priceperunit': 170, 'volumetric_weight': 0.9}, 'quantity': 18}]                                                                                                                                          5         1180           False        36   2020-08-13T04:34:11.678000Z  Lumax Cornaglia Auto Tech Private Limited  Yantraksh Logistics Private limited_GGNPC1
1  58  KIT1182B  [{'id': 259, 'product': {'name': 'Plastic Pallet', 'short_code': 'PP001', 'priceperunit': 2500, 'volumetric_weight': 21.34}, 'quantity': 5}, {'id': 260, 'product': {'name': 'Plastic Sidewall', 'short_code': 'PS001', 'priceperunit': 1250, 'volumetric_weight': 16.1}, 'quantity': 5}, {'id': 261, 'product': {'name': 'Plastic Lid', 'short_code': 'PL001', 'priceperunit': 1250, 'volumetric_weight': 9.7}, 'quantity': 5}]  7         1180           False        36   2020-08-13T04:34:11.678000Z  Lumax Cornaglia Auto Tech Private Limited  Yantraksh Logistics Private limited_GGNPC1

列items在字典中包含有关产品的所有数据。可以以类似于第3行的方式扩展它，但我建议强烈反对它。我稍后再解释原因。因此，第2行根据工具包中物品的数量对每一行进行爆破。第三行提取prod_name和prod_quantity，最后一行提取原始数据。

那么，为什么不应该有一个列数量可变的表呢？你永远不会知道你在每个工具包中有多少物品。您将篡改这些变量列的值。这比字典里有信息还要糟糕。

更新

要按您所要求的方式获得结果，只需运行以下命令：

data = json.loads(d)
df = pd.json_normalize(data, record_path=['kits'], meta= ['transaction_no', 'is_delivered','flow', 'transaction_date', 'receiver_client', 'warehouse'] )
tmp = df['items'].apply(lambda it: [{'product'+str(indx+1):x['product']['short_code'], 'quantity'+str(indx+1):x['quantity']} for indx,x in enumerate(it)])
tmp = tmp.apply(lambda x :  {k:el[k] for el in x for k in el})
tmp = pd.DataFrame.from_records(tmp)
df = pd.concat([df, tmp], axis=1)
df = df.drop(columns=['items', 'id'])

根据您在网上发布的数据，结果是：

        kit  quantity transaction_no is_delivered flow             transaction_date                            receiver_client                                   warehouse product1  quantity1 product2  quantity2 product3  quantity3  product4  quantity4 product5  quantity5
0  KIT1182A  5         1180           False        36   2020-08-13T04:34:11.678000Z  Lumax Cornaglia Auto Tech Private Limited  Yantraksh Logistics Private limited_GGNPC1  PP001    5          PS002    5          PL001    5          FIN1182A  30         FSS001   18       
1  KIT1182B  5         1180           False        36   2020-08-13T04:34:11.678000Z  Lumax Cornaglia Auto Tech Private Limited  Yantraksh Logistics Private limited_GGNPC1  PP001    5          PS001    5          PL001    5          FIN1182B  20         FSS001   25       
2  KIT1151   14        1179           False        1    2020-08-11T04:31:31.245000Z  Mahindra & Mahindra_Kandivali              Yantraksh Logistics Private limited_GGNPC1  PP001    14         PS001    14         PL001    14         FIN1151A  28         FSS001   42       
3  KIT1151   15        1178           False        32   2020-08-10T04:30:12.022000Z  Mahindra Vehicle Manufacturers Pune        Yantraksh Logistics Private limited_GGNPC1  PP001    15         PS001    15         PL001    15         FIN1151A  29         FSS001   43

票数 6

Stack Overflow用户

发布于 2020-08-20 23:19:23

使用json_normalize，您可以得到以下内容：

data = json.loads(d)
df = pd.json_normalize(data,
                       record_path=['kits', 'items'],
                       meta=[
                         ['kits', 'kit'], 
                         ['id'], 
                         ['kits', 'quantity'],
                         ['warehouse'],
                         ['receiver_client']
                      ],
                       meta_prefix='top')
print(df)


    id  quantity      product.name  ... topkits.quantity                                topwarehouse                         topreceiver_client
0  254         5    Plastic Pallet  ...                5  Yantraksh Logistics Private limited_GGNPC1  Lumax Cornaglia Auto Tech Private Limited
1  258        18   Separator Sheet  ...                5  Yantraksh Logistics Private limited_GGNPC1  Lumax Cornaglia Auto Tech Private Limited
2  259         5    Plastic Pallet  ...                7  Yantraksh Logistics Private limited_GGNPC1  Lumax Cornaglia Auto Tech Private Limited
3  260         5  Plastic Sidewall  ...                7  Yantraksh Logistics Private limited_GGNPC1  Lumax Cornaglia Auto Tech Private Limited
4  261         5       Plastic Lid  ...                7  Yantraksh Logistics Private limited_GGNPC1  Lumax Cornaglia Auto Tech Private Limited

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/63389329

复制

相似问题

问扁平嵌套字典并将其转换为列
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问扁平嵌套字典并将其转换为列EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问扁平嵌套字典并将其转换为列
EN