我有一组数据如下:
values site_name timezone variable_name
0 [{'value': SAN JOAQUIN PST degC
[{'value': '9.3',
'qualifiers': ['P'],
'date': '2022-01-05'
},
{'value': '9.4',
'qualifiers': ['P'],
'date': '2022-01-05'
}]
}]
1 [{'value': SAN JOAQUIN PST pH
[{'value': '7.5',
'qualifiers': ['P'],
'date': '2022-01-05'
},
{'value': '7.8',
'qualifiers': ['P'],
'date': '2022-01-05'
}]
}]其中,值是一个长列表,每一行都有这些嵌套的值集。如何使用熊猫将每个变量的名称转换为它们自己的dataframe?
我想要这样的东西:
degC table
value date qualifier
0 9.3 2022-01-05 P
1 9.4 2022-01-05 P
pH table
value date qualifier
0 7.5 2022-01-05 P
1 7.8 2022-01-05 P以下是我迄今尝试过的:
df = pd.json_normalize(file)
for i in range(len(df.index)):
pd.json_normalize(df.iloc[i])原始输入,如上所示:
df = pd.DataFrame({'values':[[{'value': [{'value': '9.3', 'qualifiers': ['P'], 'date': '2022-01-05'},
{'value': '9.4', 'qualifiers': ['P'], 'date': '2022-01-05'}]
}],
[{'value': [{'value': '7.5', 'qualifiers': ['P'], 'date': '2022-01-05'},
{'value': '7.8', 'qualifiers': ['P'], 'date': '2022-01-05'}]
}]],
'variable_name':['degC','pH']})发布于 2022-01-06 22:17:17
与数据框架:
>>> df
values variable_name
0 [{'value': [{'value': '9.3', 'qualifiers': ['P'] ... degC
1 [{'value': [{'value': '7.5', 'qualifiers': ['P'] ... pH使用此方法:
new_df = df['values'].apply(lambda x: pd.DataFrame(x[0]['value']))
new_df.index = df['variable_name']
>>> new_df.loc['degC']
value qualifiers date
0 9.3 [P] 2022-01-05
1 9.4 [P] 2022-01-05
>>> new_df.loc['pH']
value qualifiers date
0 7.5 [P] 2022-01-05
1 7.8 [P] 2022-01-05https://stackoverflow.com/questions/70613812
复制相似问题