我需要一个帮助来将json数据转换成数据格式。你能帮我怎么做吗?
示例:
JSON数据
{
"user_id": "vmani4",
"password": "*****",
"api_name": "KOL",
"body": {
"api_name": "KOL",
"columns": [
"kol_id",
"jnj_id",
"kol_full_nm",
"thrc_cd"
],
"filter": {
"kol_id": "101152",
"jnj_id": "7124166",
"thrc_nm": "VIR"
}
}
}理想产出:
user_id password api_name columns filter filter_value
vmani ****** KOL kol_id kol_id 101152
jnj_id jnj_id 7124166
kol_full_nm thrc_nm VIR
thrc_cd发布于 2020-09-05 17:18:01
我不熟悉DataFrame,但我尽力以适当的方式想出您想要的输出的解决方案。
代码
import pandas as pd
import json
import numpy as np
json_data = """ {
"user_id": "vmani4",
"password": "*****",
"api_name": "KOL",
"body": {
"api_name": "KOL",
"columns": [
"kol_id",
"jnj_id",
"kol_full_nm",
"thrc_cd"
],
"filter": {
"kol_id": "101152",
"jnj_id": "7124166",
"thrc_nm": "VIR"
}
}
}"""
python_data = json.loads(json_data)
filter = {}
list_for_filter = []
filter_value = {}
list_for_filter_value = []
first_level = {}
for_colums = {}
for x, y in python_data.items():
if type(y) is dict:
for j, k in y.items():
if j == 'columns':
for_colums[j] = k
if type(k) is dict:
for m, n in k.items():
list_for_filter.append(m)
list_for_filter_value.append(n)
break
first_level[x] = [y]
filter['filter'] = list_for_filter
filter_value['filter_value'] = list_for_filter_value
res = {**first_level, **for_colums, **filter, **filter_value}
df = pd.concat([pd.Series(v, name=k) for k, v in res.items()], axis=1)
print(df)输出
user_id password api_name columns filter filter_value
0 vmani4 ***** KOL kol_id kol_id 101152
1 NaN NaN NaN jnj_id jnj_id 7124166
2 NaN NaN NaN kol_full_nm thrc_nm VIR
3 NaN NaN NaN thrc_cd NaN NaN让我简短地介绍一下我的代码第一次创建了大量的lists和dicts,我这么做的原因是我在您想要的输出中看到了一些实际上不在代码中的列,比如filter_value。
我还循环了dict项目,以使另一个dict将满足期望的输出。
毕竟,由于DataFrame中的列表长度不相等,所以我使用了concat和series
发布于 2020-09-05 16:03:42
data将是JSON.pandas.json_normalize,用于将JSON加载到DataFrame中,并删除不需要的列。pandas.DataFrame.explode将'body.columns'列表展开为单独的行。data['body']['filter']D18DataFrames.DataFrame,将这两个DataFrames.DataFrames.D23DataFrames.'thrc_nm'不映射到'body.columns'.'filter'中的任何内容,'filter_value'作为单独的列添加,按JSON中的顺序排列,而不与'body.columns'.相关联
python 3.10**,** pandas 1.4.3import pandas as pd
# load the json data
df = pd.json_normalize(data).drop(columns=['body.filter.kol_id', 'body.filter.jnj_id', 'body.filter.thrc_nm'])
# explode the column
df = df.explode('body.columns', ignore_index=True)
# load and clean data[body][filter]
df_filter = pd.DataFrame.from_dict(data['body']['filter'], orient='index').reset_index().rename(columns={'index': 'filter', 0: 'filter_value'})
# join the dataframes
dfj = df.join(df_filter)
# display(dfj)
user_id password api_name body.api_name body.columns filter filter_value
0 vmani4 ***** KOL KOL kol_id kol_id 101152
1 vmani4 ***** KOL KOL jnj_id jnj_id 7124166
2 vmani4 ***** KOL KOL kol_full_nm thrc_nm VIR
3 vmani4 ***** KOL KOL thrc_cd NaN NaN选项
# load data into a dataframe
df = pd.json_normalize(data)
# explode the column
df = df.explode('body.columns', ignore_index=True)
# display(df)
user_id password api_name body.api_name body.columns body.filter.kol_id body.filter.jnj_id body.filter.thrc_nm
0 vmani4 ***** KOL KOL kol_id 101152 7124166 VIR
1 vmani4 ***** KOL KOL jnj_id 101152 7124166 VIR
2 vmani4 ***** KOL KOL kol_full_nm 101152 7124166 VIR
3 vmani4 ***** KOL KOL thrc_cd 101152 7124166 VIRhttps://stackoverflow.com/questions/63755845
复制相似问题