你好,我正在尝试将字典转换为dataframe,其中包含在amazon上搜索的结果(我使用的是一个API)。我希望每个产品都是数据文件中的一行,键作为列标题。然而,在开始的时候有一些钥匙,我对在桌子上没有兴趣。
下面是将JSON转换为字典,然后将其转换为dataframe。
from pandas.io.json import json_normalize
filename = api_result.json()
def convert_json_to_dict(filename):
with open(filename) as JSON:
json_dict = json.load(JSON)
return json_dict
def convert_dict_to_df(filename):
return pd.json_normalize(convert_json_to_dict(filename))以下是字典中的部分数据( 25种产品中有2种)。
filename = {
'request_info': {
'credits_remaining': 72,
'credits_used': 28,
'credits_used_this_request': 1,
'success': True
},
'request_metadata': {
'amazon_url': 'https://www.amazon.com/s/?k=memory+cards&ref=nb_sb_noss_2',
'created_at': '2021-02-14T15:09:04.802Z',
'processed_at': '2021-02-14T15:09:11.003Z',
'timing': ['global_init - 0ms (total 0ms)',
'auth_apikey - 35ms (total 35ms)',
'auth_retrieve_plan - 20ms (total 56ms)',
'auth_retrieve_credit_usage - 22ms (total '
'79ms)',
'processing_invoking_worker - 31ms (total '
'111ms)',
'processing_execution_complete - 6202ms '
'(total 6313ms)',
'auth_credit_usage_reconcile - 81ms (total '
'6394ms)',
'global_end - 0ms (total 6394ms)'],
'total_time_taken': 6.2
},
'request_parameters': {
'amazon_domain': 'amazon.com',
'search_term': 'memory cards',
'type': 'search'
},
'search_results': [{
'asin': 'B08L26TYQ3',
'categories': [{
'id': 'search-alias=aps',
'name': 'All Departments'
}
],
'delivery': {
'price': {
'currency': 'USD',
'is_free': True,
'raw': 'FREE Shipping by Amazon',
'symbol': '$',
'value': 0
},
'tagline': 'Get it as soon as Tue, Feb 16'
},
'image': 'https://m.media-amazon.com/images/I/71z86CNVZ3L._AC_UY218_.jpg',
'is_amazon_fresh': False,
'is_prime': True,
'is_whole_foods_market': False,
'link': 'https://www.amazon.com/dp/B08L26TYQ3',
'position': 1,
'price': {
'currency': 'USD',
'raw': '$29.99',
'symbol': '$',
'value': 29.99
},
'prices': [{
'currency': 'USD',
'raw': '$29.99',
'symbol': '$',
'value': 29.99
}
],
'rating': 4.3,
'ratings_total': 74,
'sponsored': True,
'title': 'Micro Center Premium 256GB SDXC Card Class 10 '
'SD Flash Memory Card UHS-I C10 U3 V30 4K UHD '
'Video R/W Speed up to 80 MB/s for Cameras '
'Computers Trail Cams (256GB)'
}, {
'asin': 'B08N46XMPH',
'categories': [{
'id': 'search-alias=aps',
'name': 'All Departments'
}
],
'delivery': {
'price': {
'currency': 'USD',
'is_free': True,
'raw': 'FREE Shipping on orders '
'over $25 shipped by Amazon',
'symbol': '$',
'value': 0
},
'tagline': 'Get it as soon as Tue, Feb 16'
},
'image': 'https://m.media-amazon.com/images/I/51AP3QhINtL._AC_UY218_.jpg',
'is_amazon_fresh': False,
'is_prime': True,
'is_whole_foods_market': False,
'link': 'https://www.amazon.com/dp/B08N46XMPH',
'position': 2,
'price': {
'currency': 'USD',
'raw': '$22.68',
'symbol': '$',
'value': 22.68
},
'prices': [{
'currency': 'USD',
'raw': '$22.68',
'symbol': '$',
'value': 22.68
}
],
'rating': 4.8,
'ratings_total': 16,
'sponsored': True,
'title': '256GB Micro SD Memory Card SD Memory Card/TF '
'Card Class 10 High Speed Card with Adapter for '
'Camera, Phone, Computer, Dash Came, '
'Surveillance,Drone'
}
]
}dataframe看起来如下所示,尽管有更多的列:
search_term Position ASIN Categories Price Currency
1 memory cards 1 B08L26TYQ3 All Departments 29.99 USD
2 memory cards 2 B08N46XMPH All Departments 22.68 USD我已经试过这个问题的答案了,但是没有用:将Python转换为数据格式
发布于 2021-02-14 18:58:54
json_normalize不再从pandas.io.json导入。它现在位于顶层命名空间中。'search_results'键中,但是'search_term'嵌套在'request_parameters'中,因此必须将键设置为pandas.json_normalize的meta参数的list。'prices'列中的信息似乎与其他列中的现有数据重叠。pandas.DataFrame.drop删除不需要的列,或者使用pandas.DataFrame.loc只选择所需的列,如下所示。df.join(pd.DataFrame(df.pop(col).values.tolist()))是从列中规范化单个级别dict并将其连接回主数据main的最快方法,但是这个回答展示了如何处理有问题的列(例如,在尝试.values.tolist()时会导致错误)。import pandas as pd
# object returned from api
filename = api_result.json()
# begin by normalizing filename
main = pd.json_normalize(filename, record_path=['search_results'], meta=['request_parameters'])
# request_paramets is a column of dicts, which must be converted to individual columns for each key
x = main.join(pd.DataFrame(main.pop('request_parameters').values.tolist()))
# categories and prices are lists of dicts, which must be exploded into separate rows
x = x.apply(pd.Series.explode)
# convert the dicts in categories and prices to separate columns for each key
x = x.join(pd.DataFrame(x.pop('categories').values.tolist()))
x = x.join(pd.DataFrame(x.pop('prices').values.tolist()))
# display(x)
asin image is_amazon_fresh is_prime is_whole_foods_market link position rating ratings_total sponsored title delivery.price.currency delivery.price.is_free delivery.price.raw delivery.price.symbol delivery.price.value delivery.tagline price.currency price.raw price.symbol price.value amazon_domain search_term type id name currency raw symbol value
0 B08L26TYQ3 https://m.media-amazon.com/images/I/71z86CNVZ3L._AC_UY218_.jpg False True False https://www.amazon.com/dp/B08L26TYQ3 1 4.3 74 True Micro Center Premium 256GB SDXC Card Class 10 SD Flash Memory Card UHS-I C10 U3 V30 4K UHD Video R/W Speed up to 80 MB/s for Cameras Computers Trail Cams (256GB) USD True FREE Shipping by Amazon $ 0 Get it as soon as Tue, Feb 16 USD $29.99 $ 29.99 amazon.com memory cards search search-alias=aps All Departments USD $29.99 $ 29.99
1 B08N46XMPH https://m.media-amazon.com/images/I/51AP3QhINtL._AC_UY218_.jpg False True False https://www.amazon.com/dp/B08N46XMPH 2 4.8 16 True 256GB Micro SD Memory Card SD Memory Card/TF Card Class 10 High Speed Card with Adapter for Camera, Phone, Computer, Dash Came, Surveillance,Drone USD True FREE Shipping on orders over $25 shipped by Amazon $ 0 Get it as soon as Tue, Feb 16 USD $22.68 $ 22.68 amazon.com memory cards search search-alias=aps All Departments USD $22.68 $ 22.68
# use loc to select only the desired columns and rename them as needed
final = x.loc[:, ['asin', 'search_term', 'position', 'name', 'price.currency', 'price.value']].rename(columns={'name': 'Categories', 'price.currency': 'currency', 'price.value': 'price'})
# display(final)
asin search_term position Categories currency price
0 B08L26TYQ3 memory cards 1 All Departments USD 29.99
1 B08N46XMPH memory cards 2 All Departments USD 22.68发布于 2021-02-14 16:48:23
来自官方的文档
import pandas as pd
df = pd.read_json(r'/path/to/filename.json')或者试试这个:
str_json = json.loads(filename.text)
df = pd.DataFrame.from_dict(str_json)https://stackoverflow.com/questions/66197646
复制相似问题