文章/答案/技术大牛

发布

社区首页 >问答首页 >MorningStar KeyStat到pandas数据帧

问MorningStar KeyStat到pandas数据帧
EN

Stack Overflow用户

提问于 2019-01-19 04:30:38

回答 1查看 223关注 0票数 1

我正在尝试在MorningStar中读取keyStat，并知道其中的数据是在JSON中被扭曲的。到目前为止，我可以提出一个请求，可以通过Beautifulsoup获得json：

url = 'http://financials.morningstar.com/ajax/keystatsAjax.html?t=tou&culture=en-CA&region=CAN'
lm_json = requests.get(url).json()
ksContent = BeautifulSoup(lm_json["ksContent"],"html.parser")

现在这里有一点连接到我的html数据作为'ksContent‘，它包含了实际的数据作为一个表。我不是html的粉丝，我想知道怎么才能把所有的内容都做成一个漂亮的熊猫数据帧呢？由于表格很长，下面是其中的一些内容：

     <table cellpadding="0" cellspacing="0" class="r_table1 text2">
     <colgroup>
        <col width="23%"/>
        <col span="11" width="7%"/>
     </colgroup>
     <thead>
        <tr>
           <th align="left" scope="row"></th>
           <th align="right" id="Y0" scope="col">2008-12</th>
           <th align="right" id="Y1" scope="col">2009-12</th>
           <th align="right" id="Y2" scope="col">2010-12</th>
           <th align="right" id="Y3" scope="col">2011-12</th>
           <th align="right" id="Y4" scope="col">2012-12</th>
           <th align="right" id="Y5" scope="col">2013-12</th>
           <th align="right" id="Y6" scope="col">2014-12</th>
           <th align="right" id="Y7" scope="col">2015-12</th>
           <th align="right" id="Y8" scope="col">2016-12</th>
           <th align="right" id="Y9" scope="col">2017-12</th>
           <th align="right" id="Y10" scope="col">TTM</th>
        </tr>
     </thead>
     <tbody>
        <tr class="hr">
           <td colspan="12"></td>
        </tr>
        <tr>
           <th class="row_lbl" id="i0" scope="row">Revenue <span>CAD Mil</span></th>
           <td align="right" headers="Y0 i0">—</td>
           <td align="right" headers="Y1 i0">40</td>
           <td align="right" headers="Y2 i0">212</td>
           <td align="right" headers="Y3 i0">349</td>
           <td align="right" headers="Y4 i0">442</td>
           <td align="right" headers="Y5 i0">759</td>
           <td align="right" headers="Y6 i0">1,379</td>
           <td align="right" headers="Y7 i0">1,074</td>
           <td align="right" headers="Y8 i0">1,125</td>
           <td align="right" headers="Y9 i0">1,662</td>
           <td align="right" headers="Y10 i0">1,760</td>
        </tr> ...

它定义了一个标题tr、Y0、Y1 ...Y10作为实际日期，下一个tr引用它。

感谢您的帮助！

beautifulsoup

python-3.x

pandas

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-01-19 15:05:01

您可以使用read_html()将其转换为数据帧列表

import requests
import pandas as pd
url = 'http://financials.morningstar.com/ajax/keystatsAjax.html?t=tou&culture=en-CA&region=CAN'
lm_json = requests.get(url).json()
df_list=pd.read_html(lm_json["ksContent"])

您可以遍历它并逐个获得数据帧。您还可以使用dropna()去掉仅限NaN的行。

jupyter Notebook中的示例输出屏幕截图

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/54261024

复制

相似问题

问MorningStar KeyStat到pandas数据帧
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问MorningStar KeyStat到pandas数据帧EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问MorningStar KeyStat到pandas数据帧
EN