我使用pandas.io.json.json_normalize()将json对象转换为dataframe(平面数据)。json有一个嵌套的键,其中包含空项。当我运行正常化时,我需要这个项目返回空字符串或空字符串。例如:
`df_normalize = json_normalize(python_json_nested_data, ['items'],meta=['key', 'state','date'],
record_prefix='category.', errors='raise')`python_json_nested_data = [{"key":"KEY-1","state":"MA", "items":["orange", "meat", "bread"], "date":"Y16"}, {"key":"KEY-2","state":"MA", "items":["apple", "bread"], "date":"Y15"}, {"key":"KEY-3","state":"TX", "items":["bread"], "date":"Y16"}, {"key":"KEY-4","state":"TN", "items":["apple", "bread"], "date":"Y16"}, {"key":"KEY-5","state":"TN", "items":["apple", "orange"], "date":"Y15"}, {"key":"KEY-6","state":"TN", "items": [], "date":"Y14"}]
我遗漏了什么?我已经跟踪了this post的链接,并了解到这是一个错误,是修复之前的熊猫版本。我用0.25?
我希望结果包括第10行:
+----+--------------+-------+---------+--------+ | | category.0 | key | state | date | |----+--------------+-------+---------+--------| | 0 | orange | KEY-1 | MA | Y16 | | 1 | meat | KEY-1 | MA | Y16 | | 2 | bread | KEY-1 | MA | Y16 | | 3 | apple | KEY-2 | MA | Y15 | | 4 | bread | KEY-2 | MA | Y15 | | 5 | bread | KEY-3 | TX | Y16 | | 6 | apple | KEY-4 | TN | Y16 | | 7 | bread | KEY-4 | TN | Y16 | | 8 | apple | KEY-5 | TN | Y15 | | 9 | orange | KEY-5 | TN | Y15 | +----+--------------+-------+---------+--------+ +10 + nan + KEY-6 + TX + Y14 +
谢谢
发布于 2020-06-06 15:44:49
你可以这样做:
df = pd.json_normalize(python_json_nested_data)
df = df.explode('items').reset_index(drop=['index'])
print(df)
key state items date
0 KEY-1 MA orange Y16
1 KEY-1 MA meat Y16
2 KEY-1 MA bread Y16
3 KEY-2 MA apple Y15
4 KEY-2 MA bread Y15
5 KEY-3 TX bread Y16
6 KEY-4 TN apple Y16
7 KEY-4 TN bread Y16
8 KEY-5 TN apple Y15
9 KEY-5 TN orange Y15
10 KEY-6 TN NaN Y14https://stackoverflow.com/questions/62233984
复制相似问题