因此,这可能是一个重复的问题,但我会尝试,因为我还没有找到任何东西。
我正试着和熊猫平起平坐,正常的工作。看看文档的例子,这里是我试图做的最接近的例子:
data = [{'state': 'Florida',
'shortname': 'FL',
'info': {'governor': 'Rick Scott'},
'counties': [{'name': 'Dade', 'population': 12345},
{'name': 'Broward', 'population': 40000},
{'name': 'Palm Beach', 'population': 60000}]},
{'state': 'Ohio',
'shortname': 'OH',
'info': {'governor': 'John Kasich'},
'counties': [{'name': 'Summit', 'population': 1234},
{'name': 'Cuyahoga', 'population': 1337}]}]
result = pd.json_normalize(data, 'counties', ['state', 'shortname',
['info', 'governor']])
result
name population state shortname info.governor
0 Dade 12345 Florida FL Rick Scott
1 Broward 40000 Florida FL Rick Scott
2 Palm Beach 60000 Florida FL Rick Scott
3 Summit 1234 Ohio OH John Kasich
4 Cuyahoga 1337 Ohio OH John Kasich但是,这个示例向我们展示了一种获取counties中数据的方法,它与列状态和短名称并排。假设我在每个json对象的n列数位于root (上面示例中state或shortname列的n数)。我如何将它们全部包括在内,以便使县平,但保留所有相邻的东西?
首先,我尝试了这样的方法:
#None to treat data as a list of records
#Result of counties is still nested, not working
result = pd.json_normalize(data, None, ['counties'])或
result = pd.json_normalize(data, None, ['counties', 'name'])然后,我考虑使用dataframe.columns获取列并重用它,因为json_normalize的元参数可以使用字符串数组。
但我被困住了。columns似乎返回了我不想返回的嵌套json属性。
#still nested
cols = pd.json_normalize(data).columns.to_list()
#Exclude it because we already have it
cols = [index for index in cols if index != 'counties']
#remove nested columns if any
cols = [index for index in cols if "." not in index]
result = pd.json_normalize(data, 'counties', cols, errors="ignore")#still nested
name population state shortname ... other6 other7 counties info.governor
0 Dade 12345 Florida FL ... dumb_data dumb_data [{'name': 'Dade', 'population': 12345}, {'name... NaN
1 Broward 40000 Florida FL ... dumb_data dumb_data [{'name': 'Dade', 'population': 12345}, {'name... NaN
2 Palm Beach 60000 Florida FL ... dumb_data dumb_data [{'name': 'Dade', 'population': 12345}, {'name... NaN
3 Summit 1234 Ohio OH ... dumb_data dumb_data [{'name': 'Summit', 'population': 1234}, {'nam... NaN
4 Cuyahoga 1337 Ohio OH ... dumb_data dumb_data [{'name': 'Summit', 'population': 1234}, {'nam... NaN我不想仅仅对列名进行编码,因为它们改变了,在这个情况下,我有64个.
为了更好地理解,这是我从Woo Rest API处理的真正的数据。我不是在这里使用它,因为它真的很长,但基本上,我只是试图使line_items中只保留product_id,当然还有与line_items相邻的所有其他列。
发布于 2022-03-29 07:22:18
好的,伙计们,如果你想把一个json压平,并保留其他的东西,你应该使用pd.Dataframe.explode()
以下是我的逻辑:
import pandas as pd
data = [
{'state': 'Florida',
'shortname': 'FL',
'info': {'governor': 'Rick Scott'},
'counties': [
{'name': 'Dade', 'population': 12345},
{'name': 'Broward', 'population': 40000},
{'name': 'Palm Beach', 'population': 60000}
]
},
{'state': 'Ohio',
'shortname': 'OH',
'info': {'governor': 'John Kasich'},
'counties': [{'name': 'Summit', 'population': 1234},
{'name': 'Cuyahoga', 'population': 1337}]}
]
#No Formating only converting to a Df
result = pd.json_normalize(data)
#Exploding the wanted nested column
exploded = result.explode('counties')
#Keeping the name only - this can be custom
exploded['countie_name'] = exploded['counties'].apply(lambda x: x['name'])
#Drop the used column since we took what interested us inside it.
exploded = exploded.drop(['counties'], axis=1)
print(exploded)#Duplicate for Florida, as wanted with diferent countie names
state shortname info.governor countie_name
0 Florida FL Rick Scott Dade
0 Florida FL Rick Scott Broward
0 Florida FL Rick Scott Palm Beach
1 Ohio OH John Kasich Summit
1 Ohio OH John Kasich Cuyahoga假设您将产品篮的内容作为嵌套的json,要在保留常规购物篮属性的同时explode该篮子的内容,则可以这样做。
https://stackoverflow.com/questions/71656819
复制相似问题