首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >扁平嵌套字典并将其转换为列

扁平嵌套字典并将其转换为列
EN

Stack Overflow用户
提问于 2020-08-13 06:27:13
回答 2查看 231关注 0票数 0

我有一个JSON,我把它转换成了一个字典,并试图用它制作一个数据格式。问题在于它是多个嵌套的,而且数据不一致。

例如:

代码语言:javascript
复制
d = """[
      {
        "id": 51,
        "kits": [
            {
                "id": 57,
                "kit": "KIT1182A",
                "items": [
                    {
                        "id": 254,
                        "product": {
                            "name": "Plastic Pallet",
                            "short_code": "PP001",
                            "priceperunit": 2500,
                            "volumetric_weight": 21.34
                        },
                        "quantity": 5
                    },
                    {
                        "id": 258,
                        "product": {
                            "name": "Separator Sheet",
                            "short_code": "FSS001",
                            "priceperunit": 170,
                            "volumetric_weight": 0.9
                        },
                        "quantity": 18
                    }
                ],
                "quantity": 5
            },                                     #end of kit
            {
                "id": 58,
                "kit": "KIT1182B",
                "items": [
                    {
                        "id": 259,
                        "product": {
                            "name": "Plastic Pallet",
                            "short_code": "PP001",
                            "priceperunit": 2500,
                            "volumetric_weight": 21.34
                        },
                        "quantity": 5
                    },
                    {
                        "id": 260,
                        "product": {
                            "name": "Plastic Sidewall",
                            "short_code": "PS001",
                            "priceperunit": 1250,
                            "volumetric_weight": 16.1
                        },
                        "quantity": 5
                    },
                    {
                        "id": 261,
                        "product": {
                            "name": "Plastic Lid",
                            "short_code": "PL001",
                            "priceperunit": 1250,
                            "volumetric_weight": 9.7
                        },
                        "quantity": 5
                    }
                   
                ],
                "quantity": 7
            }                                    #end of kit
        ],
        "warehouse": "Yantraksh Logistics Private limited_GGNPC1",
        "receiver_client": "Lumax Cornaglia Auto Tech Private Limited",
        "transport_by": "Kiran Roadways",
        "transaction_type": "Return",
        "transaction_date": "2020-08-13T04:34:11.678000Z",
        "transaction_no": 1180,
        "is_delivered": false,
        "driver_name": "__________",
        "driver_number": "__________",
        "lr_number": 0,
        "vehicle_number": "__________",
        "freight_charges": 0,
        "vehicle_type": "Part Load",
        "remarks": "0",
        "flow": 36,
        "owner": 2
    } ]"""

我想把它转换成一个数据文件,如下所示:

代码语言:javascript
复制
transaction_no  is_delivered    flow    transaction_date    receiver_client warehouse   kits    quantity    product1    quantity1   product2    quantity2   product3    quantity3
1180       False    36  2020-08-13T04:34:11.678000Z Lumax Cornaglia Auto Tech Private Limited   Yantraksh Logistics Private limited_GGNPC1  KIT1182A    5   PP001   5   FSS001  18  NaN NaN
1180       False    36  2020-08-13T04:34:11.678000Z Lumax Cornaglia Auto Tech Private Limited   Yantraksh Logistics Private limited_GGNPC1  KIT1182B    7   PP001   5   PS001   5   PL001   7.0

或者以一种更好的方式表现出来:

我所做的:

代码语言:javascript
复制
data = json.loads(d)
result_dataframe = pd.DataFrame(data)
l = ['transaction_no', 'is_delivered','flow', 'transaction_date', 'receiver_client', 'warehouse','kits']  #fields that I need
result_dataframe = result_dataframe[l]
result_dataframe.to_csv("out.csv")

我试过:

代码语言:javascript
复制
def flatten(input_dict, separator='_', prefix=''):
output_dict = {}
for key, value in input_dict.items():
    if isinstance(value, dict) and value:
        deeper = flatten(value, separator, prefix+key+separator)
        output_dict.update({key2: val2 for key2, val2 in deeper.items()})
    elif isinstance(value, list) and value:
        for index, sublist in enumerate(value, start=1):
            if isinstance(sublist, dict) and sublist:
                deeper = flatten(sublist, separator, prefix+key+separator+str(index)+separator)
                output_dict.update({key2: val2 for key2, val2 in deeper.items()})
            else:
                output_dict[prefix+key+separator+str(index)] = value
    else:
        output_dict[prefix+key] = value
return output_dict

但是它给出了单个行中的所有值,我如何在工具包的基础上分离它们并得到结果?

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-08-20 17:05:02

像上面这样的数据转换非常常见。熊猫提供了很多工具来帮助你完成这项任务。

代码语言:javascript
复制
data = json.loads(d)
df = pd.json_normalize(data, record_path=['kits'], meta= ['transaction_no', 'is_delivered','flow', 'transaction_date', 'receiver_client', 'warehouse'])  # line 1
df = df.explode('items')  # line 2
df[['product_code', 'product_quantity']] = df['items'].apply(lambda x: pd.Series([x['product']['short_code'], x['quantity']]))  # line 3
df.drop(columns=['items'])  # line 4

会为你提供

代码语言:javascript
复制
  id       kit  quantity transaction_no is_delivered flow             transaction_date                            receiver_client                                   warehouse product_code  product_quantity
0  57  KIT1182A  5         1180           False        36   2020-08-13T04:34:11.678000Z  Lumax Cornaglia Auto Tech Private Limited  Yantraksh Logistics Private limited_GGNPC1  PP001        5               
0  57  KIT1182A  5         1180           False        36   2020-08-13T04:34:11.678000Z  Lumax Cornaglia Auto Tech Private Limited  Yantraksh Logistics Private limited_GGNPC1  FSS001       18              
1  58  KIT1182B  7         1180           False        36   2020-08-13T04:34:11.678000Z  Lumax Cornaglia Auto Tech Private Limited  Yantraksh Logistics Private limited_GGNPC1  PP001        5               
1  58  KIT1182B  7         1180           False        36   2020-08-13T04:34:11.678000Z  Lumax Cornaglia Auto Tech Private Limited  Yantraksh Logistics Private limited_GGNPC1  PS001        5               
1  58  KIT1182B  7         1180           False        36   2020-08-13T04:34:11.678000Z  Lumax Cornaglia Auto Tech Private Limited  Yantraksh Logistics Private limited_GGNPC1  PL001        5               

实际上,诀窍只存在于pd.json_normalize (第1行)中。它将创建一个与您所要求的数据非常相似的数据:

代码语言:javascript
复制
   id       kit                                                                                                                                                                                                                                                                                                                                                                                                                             items  quantity transaction_no is_delivered flow             transaction_date                            receiver_client                                   warehouse
0  57  KIT1182A  [{'id': 254, 'product': {'name': 'Plastic Pallet', 'short_code': 'PP001', 'priceperunit': 2500, 'volumetric_weight': 21.34}, 'quantity': 5}, {'id': 258, 'product': {'name': 'Separator Sheet', 'short_code': 'FSS001', 'priceperunit': 170, 'volumetric_weight': 0.9}, 'quantity': 18}]                                                                                                                                          5         1180           False        36   2020-08-13T04:34:11.678000Z  Lumax Cornaglia Auto Tech Private Limited  Yantraksh Logistics Private limited_GGNPC1
1  58  KIT1182B  [{'id': 259, 'product': {'name': 'Plastic Pallet', 'short_code': 'PP001', 'priceperunit': 2500, 'volumetric_weight': 21.34}, 'quantity': 5}, {'id': 260, 'product': {'name': 'Plastic Sidewall', 'short_code': 'PS001', 'priceperunit': 1250, 'volumetric_weight': 16.1}, 'quantity': 5}, {'id': 261, 'product': {'name': 'Plastic Lid', 'short_code': 'PL001', 'priceperunit': 1250, 'volumetric_weight': 9.7}, 'quantity': 5}]  7         1180           False        36   2020-08-13T04:34:11.678000Z  Lumax Cornaglia Auto Tech Private Limited  Yantraksh Logistics Private limited_GGNPC1

items在字典中包含有关产品的所有数据。可以以类似于第3行的方式扩展它,但我建议强烈反对它。我稍后再解释原因。因此,第2行根据工具包中物品的数量对每一行进行爆破。第三行提取prod_nameprod_quantity,最后一行提取原始数据。

那么,为什么不应该有一个列数量可变的表呢?你永远不会知道你在每个工具包中有多少物品。您将篡改这些变量列的值。这比字典里有信息还要糟糕。

更新

要按您所要求的方式获得结果,只需运行以下命令:

代码语言:javascript
复制
data = json.loads(d)
df = pd.json_normalize(data, record_path=['kits'], meta= ['transaction_no', 'is_delivered','flow', 'transaction_date', 'receiver_client', 'warehouse'] )
tmp = df['items'].apply(lambda it: [{'product'+str(indx+1):x['product']['short_code'], 'quantity'+str(indx+1):x['quantity']} for indx,x in enumerate(it)])
tmp = tmp.apply(lambda x :  {k:el[k] for el in x for k in el})
tmp = pd.DataFrame.from_records(tmp)
df = pd.concat([df, tmp], axis=1)
df = df.drop(columns=['items', 'id'])

根据您在网上发布的数据,结果是:

代码语言:javascript
复制
        kit  quantity transaction_no is_delivered flow             transaction_date                            receiver_client                                   warehouse product1  quantity1 product2  quantity2 product3  quantity3  product4  quantity4 product5  quantity5
0  KIT1182A  5         1180           False        36   2020-08-13T04:34:11.678000Z  Lumax Cornaglia Auto Tech Private Limited  Yantraksh Logistics Private limited_GGNPC1  PP001    5          PS002    5          PL001    5          FIN1182A  30         FSS001   18       
1  KIT1182B  5         1180           False        36   2020-08-13T04:34:11.678000Z  Lumax Cornaglia Auto Tech Private Limited  Yantraksh Logistics Private limited_GGNPC1  PP001    5          PS001    5          PL001    5          FIN1182B  20         FSS001   25       
2  KIT1151   14        1179           False        1    2020-08-11T04:31:31.245000Z  Mahindra & Mahindra_Kandivali              Yantraksh Logistics Private limited_GGNPC1  PP001    14         PS001    14         PL001    14         FIN1151A  28         FSS001   42       
3  KIT1151   15        1178           False        32   2020-08-10T04:30:12.022000Z  Mahindra Vehicle Manufacturers Pune        Yantraksh Logistics Private limited_GGNPC1  PP001    15         PS001    15         PL001    15         FIN1151A  29         FSS001   43       
票数 6
EN

Stack Overflow用户

发布于 2020-08-20 23:19:23

使用json_normalize,您可以得到以下内容:

代码语言:javascript
复制
data = json.loads(d)
df = pd.json_normalize(data,
                       record_path=['kits', 'items'],
                       meta=[
                         ['kits', 'kit'], 
                         ['id'], 
                         ['kits', 'quantity'],
                         ['warehouse'],
                         ['receiver_client']
                      ],
                       meta_prefix='top')
print(df)


    id  quantity      product.name  ... topkits.quantity                                topwarehouse                         topreceiver_client
0  254         5    Plastic Pallet  ...                5  Yantraksh Logistics Private limited_GGNPC1  Lumax Cornaglia Auto Tech Private Limited
1  258        18   Separator Sheet  ...                5  Yantraksh Logistics Private limited_GGNPC1  Lumax Cornaglia Auto Tech Private Limited
2  259         5    Plastic Pallet  ...                7  Yantraksh Logistics Private limited_GGNPC1  Lumax Cornaglia Auto Tech Private Limited
3  260         5  Plastic Sidewall  ...                7  Yantraksh Logistics Private limited_GGNPC1  Lumax Cornaglia Auto Tech Private Limited
4  261         5       Plastic Lid  ...                7  Yantraksh Logistics Private limited_GGNPC1  Lumax Cornaglia Auto Tech Private Limited
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/63389329

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档