假设我有一个API响应,如:
{
"fact": {
"UP": [{
"SCODE": "CNB",
"SNAME": "Kanpur Central"
}, {
"SCODE": "JHS",
"SNAME": "Jhansi Junction"
}],
"MP": [{
"SCODE": "BPL",
"SNAME": "Bhopal Junction"
}, {
"SCODE": "JBP",
"SNAME": "Jabalpur Junction"
}]
}
}我必须将其转换为如下所示的数据格式(预期输出):
fact SCODE SNAME
UP CNB Kanpur Central
UP JHS Jhansi Junction
MP BPL Bhopal Junction
MP JBP Jabalpur Junction我的努力:我尝试使用json_normalize(),但没有达到预期的输出:
pd.json_normalize(response).apply(pd.Series.explode)发布于 2021-06-15 01:51:25
一种选择是使用python重新构建:
df = pd.DataFrame([{'fact': k, **item}
for k, lst in response['fact'].items()
for item in lst]) fact SCODE SNAME
0 UP CNB Kanpur Central
1 UP JHS Jhansi Junction
2 MP BPL Bhopal Junction
3 MP JBP Jabalpur Junctionpandas通过explode + apply pd.Series的选项
df = (
pd.DataFrame(response)['fact']
.explode()
.apply(pd.Series)
.rename_axis('fact')
.reset_index()
) fact SCODE SNAME
0 MP BPL Bhopal Junction
1 MP JBP Jabalpur Junction
2 UP CNB Kanpur Central
3 UP JHS Jhansi Junction发布于 2021-06-15 01:57:45
response,您必须创建另一个结构,因为json_normalize使用字典列表,而fact必须包含在那里:。
new_response = [{"fact": rfact, **r} for rfact in response["fact"] for r in response["fact"][rfact]]最后,只需应用该函数:
final_result = pd.json_normalize(new_response)
fact SCODE SNAME
0 UP CNB Kanpur Central
1 UP JHS Jhansi Junction
2 MP BPL Bhopal Junction
3 MP JBP Jabalpur Junction发布于 2021-06-15 09:10:22
不像直接在字典中工作那样有效(所选的答案做得很好):
data = {
"fact": {
"UP": [{
"SCODE": "CNB",
"SNAME": "Kanpur Central"
}, {
"SCODE": "JHS",
"SNAME": "Jhansi Junction"
}],
"MP": [{
"SCODE": "BPL",
"SNAME": "Bhopal Junction"
}, {
"SCODE": "JBP",
"SNAME": "Jabalpur Junction"
}]
}
}
keys = data['fact']
(pd.concat([jn(data['fact'][key]) for key in keys],
keys = keys)
.droplevel(-1)
.rename_axis(index='fact')
.reset_index()
)
fact SCODE SNAME
0 UP CNB Kanpur Central
1 UP JHS Jhansi Junction
2 MP BPL Bhopal Junction
3 MP JBP Jabalpur Junctionhttps://stackoverflow.com/questions/67978892
复制相似问题