首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >全新的美汤。需要从在线报告中抓取表

全新的美汤。需要从在线报告中抓取表
EN

Stack Overflow用户
提问于 2022-06-16 20:48:59
回答 1查看 48关注 0票数 0

我想用漂亮的汤刮下下面的数据。我能搞清楚。请帮帮忙。

代码语言:javascript
复制
 <TABLE WIDTH=100%>
        <TD VALIGN="TOP" WIDTH="30%">
            <TABLE BORDER="1" WIDTH="100%">
            <TR>
                <TH COLSPAN="3"><CENTER><B>SUMMARY</B></CENTER></TH>
            </TR>
            <TR><TD>Alberta Total Net Generation</TD><TD>9299</TD></TR>
<TR><TD>Net Actual Interchange</TD><TD>-386</TD></TR>
<TR><TD>Alberta Internal Load (AIL)</TD><TD>9685</TD></TR>
<TR><TD>Net-To-Grid Generation</TD><TD>6897</TD></TR>
<TR><TD>Contingency Reserve Required</TD><TD>518</TD></TR>
<TR><TD>Dispatched Contingency Reserve (DCR)</TD><TD>552</TD></TR>
<TR><TD>Dispatched Contingency Reserve -Gen</TD><TD>374</TD></TR>
<TR><TD>Dispatched Contingency Reserve -Other</TD><TD>178</TD></TR>
<TR><TD>LSSi Armed Dispatch</TD><TD>73</TD></TR>
<TR><TD>LSSi Offered Volume</TD><TD>73</TD></TR>

            </TABLE>

这是我想要刮的链接。http://ets.aeso.ca/ets_web/ip/Market/Reports/CSDReportServlet

我需要摘要,生成和交换表分别。任何帮助都会很好..。

EN

回答 1

Stack Overflow用户

发布于 2022-06-16 21:38:20

我会使用pd.read_html + beautifulsoup来读取数据。此外,在解析页面时使用html5lib解析器(包含格式错误的标记):

代码语言:javascript
复制
import requests
import pandas as pd
from bs4 import BeautifulSoup


def get_summary(soup):
    summary = soup.select_one(
        "table:has(b:-soup-contains(SUMMARY)):not(:has(table))"
    )
    summary.tr.extract()
    return pd.read_html(str(summary))[0]


def get_generation(soup):
    generation = soup.select_one(
        "table:has(b:-soup-contains(GENERATION)):not(:has(table))"
    )
    generation.tr.extract()
    for td in generation.tr.select("td"):
        td.name = "th"
    return pd.read_html(str(generation))[0]


def get_interchange(soup):
    interchange = soup.select_one(
        "table:has(b:-soup-contains(INTERCHANGE)):not(:has(table))"
    )
    interchange.tr.extract()
    for td in interchange.tr.select("td"):
        td.name = "th"
    return pd.read_html(str(interchange))[0]


url = "http://ets.aeso.ca/ets_web/ip/Market/Reports/CSDReportServlet"
soup = BeautifulSoup(requests.get(url).content, "html5lib")

print(get_summary(soup))
print(get_generation(soup))
print(get_interchange(soup))

指纹:

代码语言:javascript
复制
                                       0     1
0           Alberta Total Net Generation  9359
1                 Net Actual Interchange  -343
2            Alberta Internal Load (AIL)  9702
3                 Net-To-Grid Generation  6946
4           Contingency Reserve Required   514
5   Dispatched Contingency Reserve (DCR)   552
6    Dispatched Contingency Reserve -Gen   374
7  Dispatched Contingency Reserve -Other   178
8                    LSSi Armed Dispatch    78
9                    LSSi Offered Volume    82

            GROUP     MC   TNG  DCR
0             GAS  10836  6801   79
1           HYDRO    894   270  233
2  ENERGY STORAGE     50     0   50
3           SOLAR    936   303    0
4            WIND   2269   448    0
5           OTHER    424   273   12
6       DUAL FUEL      0     0    0
7            COAL   1266  1264    0
8           TOTAL  16675  9359  374

               PATH  ACTUAL FLOW
0  British Columbia         -230
1           Montana         -113
2      Saskatchewan            0
3             TOTAL         -343
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/72651724

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档