首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >我如何正确地将这个JSON文件导入熊猫?

我如何正确地将这个JSON文件导入熊猫?
EN

Stack Overflow用户
提问于 2021-09-18 12:20:18
回答 1查看 78关注 0票数 0

这是使用的链接:http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/prc_hicp_mmor?precision=1&geo=HU&geo=PL&geo=SK&unit=RCH_M&coicop=CP00,它有3个国家的通货膨胀数据。到目前为止,这是我的代码:

代码语言:javascript
复制
import urllib.request, json
import pandas as pd
import requests
from pandas.io.json import json_normalize

url = r"http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/prc_hicp_mmor?precision=1&geo=HU&geo=PL&geo=SK&unit=RCH_M&coicop=CP00"
with urllib.request.urlopen(url) as url:
    data = json.loads(url.read().decode())
    df2 = pd.DataFrame(pd.json_normalize(data))
print(df) 

这就是输出,这与我所需要的相差甚远.:对于JSON文件中提到的每一个日期,我需要3个国家的HCIP。

代码语言:javascript
复制
 version                                              label  \
0     2.0  HICP (2015 = 100) - monthly data (monthly rate...   

                                                href    source     updated  \
0  http://ec.europa.eu/eurostat/wdds/rest/data/v2...  Eurostat  2021-08-31   

     class                         id            size status.0 status.1  ...  \
0  dataset  [unit, coicop, geo, time]  [1, 1, 3, 307]        d        d  ...   

  dimension.time.category.label.2020M11 dimension.time.category.label.2020M12  \
0                               2020M11                               2020M12   

  dimension.time.category.label.2021M01 dimension.time.category.label.2021M02  \
0                               2021M01                               2021M02   

  dimension.time.category.label.2021M03 dimension.time.category.label.2021M04  \
0                               2021M03                               2021M04   

  dimension.time.category.label.2021M05 dimension.time.category.label.2021M06  \
0                               2021M05                               2021M06   

  dimension.time.category.label.2021M07 dimension.time.category.label.2021M08  
0                               2021M07                               2021M08  

知道我做错了什么吗?或者如何正确导入这样的格式?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-09-18 13:20:42

根据我对你数据的了解:

  • data['id']指定dimensions
  • data['dimension'][*]['category']的顺序,指定每个dimension
  • data['value']的值顺序是原始值。

剩下的似乎是多余的。

现在最简单的是提取值,让我们还确保索引正确排序:

代码语言:javascript
复制
>>> val = pd.Series(data['value']).rename(index=int).sort_index()
>>> val
0      2.4
1      1.6
2      1.7
3      2.2
4      0.8
      ... 
916    0.2
917    0.6
918    0.4
919    0.5
920    0.2
Length: 921, dtype: float64

然后,对于每个维度,我们可以根据信息构造一个简单的数据,例如对于geo

代码语言:javascript
复制
>>> pd.DataFrame({key: val for key, val in data['dimension']['geo']['category'].items()})
    index     label
HU      0   Hungary
PL      1    Poland
SK      2  Slovakia

因此,按列index进行排序,我们将按照所需的顺序对label进行排序,并根据data['id']对这些排序进行排序,得到:

代码语言:javascript
复制
>>> dimensions = [pd.DataFrame({
...     key: val for key, val in data['dimension'][dim]['category'].items()
... }).sort_values('index')['label'].values for dim in data['id']]
>>> dimensions
[array(['Monthly rate of change'], dtype=object), array(['All-items HICP'], dtype=object), array(['Hungary', 'Poland', 'Slovakia'], dtype=object), array(['1996M02', '1996M03', '1996M04', '1996M05', '1996M06', '1996M07',
       '1996M08', '1996M09', '1996M10', '1996M11', '1996M12', '1997M01',
       '1997M02', '1997M03', '1997M04', '1997M05', '1997M06', '1997M07',
       '1997M08', '1997M09', '1997M10', '1997M11', '1997M12', '1998M01',
       '1998M02', '1998M03', '1998M04', '1998M05', '1998M06', '1998M07',
       '1998M08', '1998M09', '1998M10', '1998M11', '1998M12', '1999M01',
       '1999M02', '1999M03', '1999M04', '1999M05', '1999M06', '1999M07',
       '1999M08', '1999M09', '1999M10', '1999M11', '1999M12', '2000M01',
       '2000M02', '2000M03', '2000M04', '2000M05', '2000M06', '2000M07',
       '2000M08', '2000M09', '2000M10', '2000M11', '2000M12', '2001M01',
       '2001M02', '2001M03', '2001M04', '2001M05', '2001M06', '2001M07',
       '2001M08', '2001M09', '2001M10', '2001M11', '2001M12', '2002M01',
       '2002M02', '2002M03', '2002M04', '2002M05', '2002M06', '2002M07',
       '2002M08', '2002M09', '2002M10', '2002M11', '2002M12', '2003M01',
       '2003M02', '2003M03', '2003M04', '2003M05', '2003M06', '2003M07',
       '2003M08', '2003M09', '2003M10', '2003M11', '2003M12', '2004M01',
       '2004M02', '2004M03', '2004M04', '2004M05', '2004M06', '2004M07',
       '2004M08', '2004M09', '2004M10', '2004M11', '2004M12', '2005M01',
       '2005M02', '2005M03', '2005M04', '2005M05', '2005M06', '2005M07',
       '2005M08', '2005M09', '2005M10', '2005M11', '2005M12', '2006M01',
       '2006M02', '2006M03', '2006M04', '2006M05', '2006M06', '2006M07',
       '2006M08', '2006M09', '2006M10', '2006M11', '2006M12', '2007M01',
       '2007M02', '2007M03', '2007M04', '2007M05', '2007M06', '2007M07',
       '2007M08', '2007M09', '2007M10', '2007M11', '2007M12', '2008M01',
       '2008M02', '2008M03', '2008M04', '2008M05', '2008M06', '2008M07',
       '2008M08', '2008M09', '2008M10', '2008M11', '2008M12', '2009M01',
       '2009M02', '2009M03', '2009M04', '2009M05', '2009M06', '2009M07',
       '2009M08', '2009M09', '2009M10', '2009M11', '2009M12', '2010M01',
       '2010M02', '2010M03', '2010M04', '2010M05', '2010M06', '2010M07',
       '2010M08', '2010M09', '2010M10', '2010M11', '2010M12', '2011M01',
       '2011M02', '2011M03', '2011M04', '2011M05', '2011M06', '2011M07',
       '2011M08', '2011M09', '2011M10', '2011M11', '2011M12', '2012M01',
       '2012M02', '2012M03', '2012M04', '2012M05', '2012M06', '2012M07',
       '2012M08', '2012M09', '2012M10', '2012M11', '2012M12', '2013M01',
       '2013M02', '2013M03', '2013M04', '2013M05', '2013M06', '2013M07',
       '2013M08', '2013M09', '2013M10', '2013M11', '2013M12', '2014M01',
       '2014M02', '2014M03', '2014M04', '2014M05', '2014M06', '2014M07',
       '2014M08', '2014M09', '2014M10', '2014M11', '2014M12', '2015M01',
       '2015M02', '2015M03', '2015M04', '2015M05', '2015M06', '2015M07',
       '2015M08', '2015M09', '2015M10', '2015M11', '2015M12', '2016M01',
       '2016M02', '2016M03', '2016M04', '2016M05', '2016M06', '2016M07',
       '2016M08', '2016M09', '2016M10', '2016M11', '2016M12', '2017M01',
       '2017M02', '2017M03', '2017M04', '2017M05', '2017M06', '2017M07',
       '2017M08', '2017M09', '2017M10', '2017M11', '2017M12', '2018M01',
       '2018M02', '2018M03', '2018M04', '2018M05', '2018M06', '2018M07',
       '2018M08', '2018M09', '2018M10', '2018M11', '2018M12', '2019M01',
       '2019M02', '2019M03', '2019M04', '2019M05', '2019M06', '2019M07',
       '2019M08', '2019M09', '2019M10', '2019M11', '2019M12', '2020M01',
       '2020M02', '2020M03', '2020M04', '2020M05', '2020M06', '2020M07',
       '2020M08', '2020M09', '2020M10', '2020M11', '2020M12', '2021M01',
       '2021M02', '2021M03', '2021M04', '2021M05', '2021M06', '2021M07',
       '2021M08'], dtype=object)]

现在,只需使用它来定义数据的索引,您就完成了:

代码语言:javascript
复制
>>> val.index = pd.MultiIndex.from_product(dimensions, names=data['id'])
>>> val
unit                    coicop          geo       time
Monthly rate of change  All-items HICP  Hungary   1996M02    2.4
                                                  1996M03    1.6
                                                  1996M04    1.7
                                                  1996M05    2.2
                                                  1996M06    0.8
                                                            ... 
                                        Slovakia  2021M04    0.2
                                                  2021M05    0.6
                                                  2021M06    0.4
                                                  2021M07    0.5
                                                  2021M08    0.2
Length: 921, dtype: float64

现在,使用stackunstack,您可以随意地旋转维度:

代码语言:javascript
复制
>>> val.unstack('geo')
geo                                            Hungary  Poland  Slovakia
unit                   coicop         time                              
Monthly rate of change All-items HICP 1996M02      2.4     1.4       0.3
                                      1996M03      1.6     1.6       0.2
                                      1996M04      1.7     2.0       0.3
                                      1996M05      2.2     1.3       0.5
                                      1996M06      0.8     0.9       0.2
...                                                ...     ...       ...
                                      2021M04      0.8     0.7       0.2
                                      2021M05      0.6     0.4       0.6
                                      2021M06      0.4     0.1       0.4
                                      2021M07      0.6     0.4       0.5
                                      2021M08      0.2     0.3       0.2

[307 rows x 3 columns]
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/69234585

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档