这是使用的链接:http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/prc_hicp_mmor?precision=1&geo=HU&geo=PL&geo=SK&unit=RCH_M&coicop=CP00,它有3个国家的通货膨胀数据。到目前为止,这是我的代码:
import urllib.request, json
import pandas as pd
import requests
from pandas.io.json import json_normalize
url = r"http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/prc_hicp_mmor?precision=1&geo=HU&geo=PL&geo=SK&unit=RCH_M&coicop=CP00"
with urllib.request.urlopen(url) as url:
data = json.loads(url.read().decode())
df2 = pd.DataFrame(pd.json_normalize(data))
print(df) 这就是输出,这与我所需要的相差甚远.:对于JSON文件中提到的每一个日期,我需要3个国家的HCIP。
version label \
0 2.0 HICP (2015 = 100) - monthly data (monthly rate...
href source updated \
0 http://ec.europa.eu/eurostat/wdds/rest/data/v2... Eurostat 2021-08-31
class id size status.0 status.1 ... \
0 dataset [unit, coicop, geo, time] [1, 1, 3, 307] d d ...
dimension.time.category.label.2020M11 dimension.time.category.label.2020M12 \
0 2020M11 2020M12
dimension.time.category.label.2021M01 dimension.time.category.label.2021M02 \
0 2021M01 2021M02
dimension.time.category.label.2021M03 dimension.time.category.label.2021M04 \
0 2021M03 2021M04
dimension.time.category.label.2021M05 dimension.time.category.label.2021M06 \
0 2021M05 2021M06
dimension.time.category.label.2021M07 dimension.time.category.label.2021M08
0 2021M07 2021M08 知道我做错了什么吗?或者如何正确导入这样的格式?
发布于 2021-09-18 13:20:42
根据我对你数据的了解:
data['id']指定dimensionsdata['dimension'][*]['category']的顺序,指定每个dimensiondata['value']的值顺序是原始值。剩下的似乎是多余的。
现在最简单的是提取值,让我们还确保索引正确排序:
>>> val = pd.Series(data['value']).rename(index=int).sort_index()
>>> val
0 2.4
1 1.6
2 1.7
3 2.2
4 0.8
...
916 0.2
917 0.6
918 0.4
919 0.5
920 0.2
Length: 921, dtype: float64然后,对于每个维度,我们可以根据信息构造一个简单的数据,例如对于geo
>>> pd.DataFrame({key: val for key, val in data['dimension']['geo']['category'].items()})
index label
HU 0 Hungary
PL 1 Poland
SK 2 Slovakia因此,按列index进行排序,我们将按照所需的顺序对label进行排序,并根据data['id']对这些排序进行排序,得到:
>>> dimensions = [pd.DataFrame({
... key: val for key, val in data['dimension'][dim]['category'].items()
... }).sort_values('index')['label'].values for dim in data['id']]
>>> dimensions
[array(['Monthly rate of change'], dtype=object), array(['All-items HICP'], dtype=object), array(['Hungary', 'Poland', 'Slovakia'], dtype=object), array(['1996M02', '1996M03', '1996M04', '1996M05', '1996M06', '1996M07',
'1996M08', '1996M09', '1996M10', '1996M11', '1996M12', '1997M01',
'1997M02', '1997M03', '1997M04', '1997M05', '1997M06', '1997M07',
'1997M08', '1997M09', '1997M10', '1997M11', '1997M12', '1998M01',
'1998M02', '1998M03', '1998M04', '1998M05', '1998M06', '1998M07',
'1998M08', '1998M09', '1998M10', '1998M11', '1998M12', '1999M01',
'1999M02', '1999M03', '1999M04', '1999M05', '1999M06', '1999M07',
'1999M08', '1999M09', '1999M10', '1999M11', '1999M12', '2000M01',
'2000M02', '2000M03', '2000M04', '2000M05', '2000M06', '2000M07',
'2000M08', '2000M09', '2000M10', '2000M11', '2000M12', '2001M01',
'2001M02', '2001M03', '2001M04', '2001M05', '2001M06', '2001M07',
'2001M08', '2001M09', '2001M10', '2001M11', '2001M12', '2002M01',
'2002M02', '2002M03', '2002M04', '2002M05', '2002M06', '2002M07',
'2002M08', '2002M09', '2002M10', '2002M11', '2002M12', '2003M01',
'2003M02', '2003M03', '2003M04', '2003M05', '2003M06', '2003M07',
'2003M08', '2003M09', '2003M10', '2003M11', '2003M12', '2004M01',
'2004M02', '2004M03', '2004M04', '2004M05', '2004M06', '2004M07',
'2004M08', '2004M09', '2004M10', '2004M11', '2004M12', '2005M01',
'2005M02', '2005M03', '2005M04', '2005M05', '2005M06', '2005M07',
'2005M08', '2005M09', '2005M10', '2005M11', '2005M12', '2006M01',
'2006M02', '2006M03', '2006M04', '2006M05', '2006M06', '2006M07',
'2006M08', '2006M09', '2006M10', '2006M11', '2006M12', '2007M01',
'2007M02', '2007M03', '2007M04', '2007M05', '2007M06', '2007M07',
'2007M08', '2007M09', '2007M10', '2007M11', '2007M12', '2008M01',
'2008M02', '2008M03', '2008M04', '2008M05', '2008M06', '2008M07',
'2008M08', '2008M09', '2008M10', '2008M11', '2008M12', '2009M01',
'2009M02', '2009M03', '2009M04', '2009M05', '2009M06', '2009M07',
'2009M08', '2009M09', '2009M10', '2009M11', '2009M12', '2010M01',
'2010M02', '2010M03', '2010M04', '2010M05', '2010M06', '2010M07',
'2010M08', '2010M09', '2010M10', '2010M11', '2010M12', '2011M01',
'2011M02', '2011M03', '2011M04', '2011M05', '2011M06', '2011M07',
'2011M08', '2011M09', '2011M10', '2011M11', '2011M12', '2012M01',
'2012M02', '2012M03', '2012M04', '2012M05', '2012M06', '2012M07',
'2012M08', '2012M09', '2012M10', '2012M11', '2012M12', '2013M01',
'2013M02', '2013M03', '2013M04', '2013M05', '2013M06', '2013M07',
'2013M08', '2013M09', '2013M10', '2013M11', '2013M12', '2014M01',
'2014M02', '2014M03', '2014M04', '2014M05', '2014M06', '2014M07',
'2014M08', '2014M09', '2014M10', '2014M11', '2014M12', '2015M01',
'2015M02', '2015M03', '2015M04', '2015M05', '2015M06', '2015M07',
'2015M08', '2015M09', '2015M10', '2015M11', '2015M12', '2016M01',
'2016M02', '2016M03', '2016M04', '2016M05', '2016M06', '2016M07',
'2016M08', '2016M09', '2016M10', '2016M11', '2016M12', '2017M01',
'2017M02', '2017M03', '2017M04', '2017M05', '2017M06', '2017M07',
'2017M08', '2017M09', '2017M10', '2017M11', '2017M12', '2018M01',
'2018M02', '2018M03', '2018M04', '2018M05', '2018M06', '2018M07',
'2018M08', '2018M09', '2018M10', '2018M11', '2018M12', '2019M01',
'2019M02', '2019M03', '2019M04', '2019M05', '2019M06', '2019M07',
'2019M08', '2019M09', '2019M10', '2019M11', '2019M12', '2020M01',
'2020M02', '2020M03', '2020M04', '2020M05', '2020M06', '2020M07',
'2020M08', '2020M09', '2020M10', '2020M11', '2020M12', '2021M01',
'2021M02', '2021M03', '2021M04', '2021M05', '2021M06', '2021M07',
'2021M08'], dtype=object)]现在,只需使用它来定义数据的索引,您就完成了:
>>> val.index = pd.MultiIndex.from_product(dimensions, names=data['id'])
>>> val
unit coicop geo time
Monthly rate of change All-items HICP Hungary 1996M02 2.4
1996M03 1.6
1996M04 1.7
1996M05 2.2
1996M06 0.8
...
Slovakia 2021M04 0.2
2021M05 0.6
2021M06 0.4
2021M07 0.5
2021M08 0.2
Length: 921, dtype: float64现在,使用stack和unstack,您可以随意地旋转维度:
>>> val.unstack('geo')
geo Hungary Poland Slovakia
unit coicop time
Monthly rate of change All-items HICP 1996M02 2.4 1.4 0.3
1996M03 1.6 1.6 0.2
1996M04 1.7 2.0 0.3
1996M05 2.2 1.3 0.5
1996M06 0.8 0.9 0.2
... ... ... ...
2021M04 0.8 0.7 0.2
2021M05 0.6 0.4 0.6
2021M06 0.4 0.1 0.4
2021M07 0.6 0.4 0.5
2021M08 0.2 0.3 0.2
[307 rows x 3 columns]https://stackoverflow.com/questions/69234585
复制相似问题