首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >将字典转换为数据格式

将字典转换为数据格式
EN

Stack Overflow用户
提问于 2021-02-14 16:43:56
回答 2查看 89关注 0票数 1

你好,我正在尝试将字典转换为dataframe,其中包含在amazon上搜索的结果(我使用的是一个API)。我希望每个产品都是数据文件中的一行,键作为列标题。然而,在开始的时候有一些钥匙,我对在桌子上没有兴趣。

下面是将JSON转换为字典,然后将其转换为dataframe。

代码语言:javascript
复制
from pandas.io.json import json_normalize

filename = api_result.json()

def convert_json_to_dict(filename):
    with open(filename) as JSON:
        json_dict = json.load(JSON)
    return json_dict


def convert_dict_to_df(filename):
    return pd.json_normalize(convert_json_to_dict(filename))

以下是字典中的部分数据( 25种产品中有2种)。

代码语言:javascript
复制
filename = {
    'request_info': {
        'credits_remaining': 72,
        'credits_used': 28,
        'credits_used_this_request': 1,
        'success': True
    },
    'request_metadata': {
        'amazon_url': 'https://www.amazon.com/s/?k=memory+cards&ref=nb_sb_noss_2',
        'created_at': '2021-02-14T15:09:04.802Z',
        'processed_at': '2021-02-14T15:09:11.003Z',
        'timing': ['global_init - 0ms (total 0ms)',
            'auth_apikey - 35ms (total 35ms)',
            'auth_retrieve_plan - 20ms (total 56ms)',
            'auth_retrieve_credit_usage - 22ms (total '
            '79ms)',
            'processing_invoking_worker - 31ms (total '
            '111ms)',
            'processing_execution_complete - 6202ms '
            '(total 6313ms)',
            'auth_credit_usage_reconcile - 81ms (total '
            '6394ms)',
            'global_end - 0ms (total 6394ms)'],
        'total_time_taken': 6.2
    },
    'request_parameters': {
        'amazon_domain': 'amazon.com',
        'search_term': 'memory cards',
        'type': 'search'
    },
    'search_results': [{
            'asin': 'B08L26TYQ3',
            'categories': [{
                    'id': 'search-alias=aps',
                    'name': 'All Departments'
                }
            ],
            'delivery': {
                'price': {
                    'currency': 'USD',
                    'is_free': True,
                    'raw': 'FREE Shipping by Amazon',
                    'symbol': '$',
                    'value': 0
                },
                'tagline': 'Get it as soon as Tue, Feb 16'
            },
            'image': 'https://m.media-amazon.com/images/I/71z86CNVZ3L._AC_UY218_.jpg',
            'is_amazon_fresh': False,
            'is_prime': True,
            'is_whole_foods_market': False,
            'link': 'https://www.amazon.com/dp/B08L26TYQ3',
            'position': 1,
            'price': {
                'currency': 'USD',
                'raw': '$29.99',
                'symbol': '$',
                'value': 29.99
            },
            'prices': [{
                    'currency': 'USD',
                    'raw': '$29.99',
                    'symbol': '$',
                    'value': 29.99
                }
            ],
            'rating': 4.3,
            'ratings_total': 74,
            'sponsored': True,
            'title': 'Micro Center Premium 256GB SDXC Card Class 10 '
            'SD Flash Memory Card UHS-I C10 U3 V30 4K UHD '
            'Video R/W Speed up to 80 MB/s for Cameras '
            'Computers Trail Cams (256GB)'
        }, {
            'asin': 'B08N46XMPH',
            'categories': [{
                    'id': 'search-alias=aps',
                    'name': 'All Departments'
                }
            ],
            'delivery': {
                'price': {
                    'currency': 'USD',
                    'is_free': True,
                    'raw': 'FREE Shipping on orders '
                    'over $25 shipped by Amazon',
                    'symbol': '$',
                    'value': 0
                },
                'tagline': 'Get it as soon as Tue, Feb 16'
            },
            'image': 'https://m.media-amazon.com/images/I/51AP3QhINtL._AC_UY218_.jpg',
            'is_amazon_fresh': False,
            'is_prime': True,
            'is_whole_foods_market': False,
            'link': 'https://www.amazon.com/dp/B08N46XMPH',
            'position': 2,
            'price': {
                'currency': 'USD',
                'raw': '$22.68',
                'symbol': '$',
                'value': 22.68
            },
            'prices': [{
                    'currency': 'USD',
                    'raw': '$22.68',
                    'symbol': '$',
                    'value': 22.68
                }
            ],
            'rating': 4.8,
            'ratings_total': 16,
            'sponsored': True,
            'title': '256GB Micro SD Memory Card SD Memory Card/TF '
            'Card Class 10 High Speed Card with Adapter for '
            'Camera, Phone, Computer, Dash Came, '
            'Surveillance,Drone'
        }
    ]
}

dataframe看起来如下所示,尽管有更多的列:

代码语言:javascript
复制
    search_term   Position   ASIN           Categories          Price   Currency
1   memory cards  1          B08L26TYQ3     All Departments     29.99   USD
2   memory cards  2          B08N46XMPH     All Departments     22.68   USD

我已经试过这个问题的答案了,但是没有用:将Python转换为数据格式

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-02-14 18:58:54

  • json_normalize不再从pandas.io.json导入。它现在位于顶层命名空间中。
    • 根据您的环境,用pip或conda更新您的熊猫到当前版本。

  • 大多数必需的信息都在'search_results'键中,但是'search_term'嵌套在'request_parameters'中,因此必须将键设置为pandas.json_normalizemeta参数的list
  • 'prices'列中的信息似乎与其他列中的现有数据重叠。
    • 列在下面已经规范化,但是它可以被删除,因为其中没有新的信息。

  • 可以使用pandas.DataFrame.drop删除不需要的列,或者使用pandas.DataFrame.loc只选择所需的列,如下所示。
  • 根据此定时分析问题df.join(pd.DataFrame(df.pop(col).values.tolist()))是从列中规范化单个级别dict并将其连接回主数据main的最快方法,但是这个回答展示了如何处理有问题的列(例如,在尝试.values.tolist()时会导致错误)。
代码语言:javascript
复制
import pandas as pd

# object returned from api
filename = api_result.json()

# begin by normalizing filename
main = pd.json_normalize(filename, record_path=['search_results'], meta=['request_parameters'])

# request_paramets is a column of dicts, which must be converted to individual columns for each key
x = main.join(pd.DataFrame(main.pop('request_parameters').values.tolist()))

# categories and prices are lists of dicts, which must be exploded into separate rows
x = x.apply(pd.Series.explode)

# convert the dicts in categories and prices to separate columns for each key
x = x.join(pd.DataFrame(x.pop('categories').values.tolist()))
x = x.join(pd.DataFrame(x.pop('prices').values.tolist()))

# display(x)
         asin                                                           image  is_amazon_fresh  is_prime  is_whole_foods_market                                  link  position  rating  ratings_total  sponsored                                                                                                                                                              title delivery.price.currency  delivery.price.is_free                                  delivery.price.raw delivery.price.symbol  delivery.price.value               delivery.tagline price.currency price.raw price.symbol  price.value amazon_domain   search_term    type                id             name currency     raw symbol  value
0  B08L26TYQ3  https://m.media-amazon.com/images/I/71z86CNVZ3L._AC_UY218_.jpg            False      True                  False  https://www.amazon.com/dp/B08L26TYQ3         1     4.3             74       True  Micro Center Premium 256GB SDXC Card Class 10 SD Flash Memory Card UHS-I C10 U3 V30 4K UHD Video R/W Speed up to 80 MB/s for Cameras Computers Trail Cams (256GB)                     USD                    True                             FREE Shipping by Amazon                     $                     0  Get it as soon as Tue, Feb 16            USD    $29.99            $        29.99    amazon.com  memory cards  search  search-alias=aps  All Departments      USD  $29.99      $  29.99
1  B08N46XMPH  https://m.media-amazon.com/images/I/51AP3QhINtL._AC_UY218_.jpg            False      True                  False  https://www.amazon.com/dp/B08N46XMPH         2     4.8             16       True                 256GB Micro SD Memory Card SD Memory Card/TF Card Class 10 High Speed Card with Adapter for Camera, Phone, Computer, Dash Came, Surveillance,Drone                     USD                    True  FREE Shipping on orders over $25 shipped by Amazon                     $                     0  Get it as soon as Tue, Feb 16            USD    $22.68            $        22.68    amazon.com  memory cards  search  search-alias=aps  All Departments      USD  $22.68      $  22.68

# use loc to select only the desired columns and rename them as needed
final = x.loc[:, ['asin', 'search_term', 'position', 'name', 'price.currency', 'price.value']].rename(columns={'name': 'Categories', 'price.currency': 'currency', 'price.value': 'price'})

# display(final)
         asin   search_term  position       Categories currency  price
0  B08L26TYQ3  memory cards         1  All Departments      USD  29.99
1  B08N46XMPH  memory cards         2  All Departments      USD  22.68
票数 1
EN

Stack Overflow用户

发布于 2021-02-14 16:48:23

来自官方的文档

代码语言:javascript
复制
import pandas as pd
df = pd.read_json(r'/path/to/filename.json')

或者试试这个:

代码语言:javascript
复制
str_json = json.loads(filename.text)
df = pd.DataFrame.from_dict(str_json)
票数 -1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66197646

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档