我想用漂亮的汤刮下下面的数据。我能搞清楚。请帮帮忙。
<TABLE WIDTH=100%>
<TD VALIGN="TOP" WIDTH="30%">
<TABLE BORDER="1" WIDTH="100%">
<TR>
<TH COLSPAN="3"><CENTER><B>SUMMARY</B></CENTER></TH>
</TR>
<TR><TD>Alberta Total Net Generation</TD><TD>9299</TD></TR>
<TR><TD>Net Actual Interchange</TD><TD>-386</TD></TR>
<TR><TD>Alberta Internal Load (AIL)</TD><TD>9685</TD></TR>
<TR><TD>Net-To-Grid Generation</TD><TD>6897</TD></TR>
<TR><TD>Contingency Reserve Required</TD><TD>518</TD></TR>
<TR><TD>Dispatched Contingency Reserve (DCR)</TD><TD>552</TD></TR>
<TR><TD>Dispatched Contingency Reserve -Gen</TD><TD>374</TD></TR>
<TR><TD>Dispatched Contingency Reserve -Other</TD><TD>178</TD></TR>
<TR><TD>LSSi Armed Dispatch</TD><TD>73</TD></TR>
<TR><TD>LSSi Offered Volume</TD><TD>73</TD></TR>
</TABLE>这是我想要刮的链接。http://ets.aeso.ca/ets_web/ip/Market/Reports/CSDReportServlet
我需要摘要,生成和交换表分别。任何帮助都会很好..。
发布于 2022-06-16 21:38:20
我会使用pd.read_html + beautifulsoup来读取数据。此外,在解析页面时使用html5lib解析器(包含格式错误的标记):
import requests
import pandas as pd
from bs4 import BeautifulSoup
def get_summary(soup):
summary = soup.select_one(
"table:has(b:-soup-contains(SUMMARY)):not(:has(table))"
)
summary.tr.extract()
return pd.read_html(str(summary))[0]
def get_generation(soup):
generation = soup.select_one(
"table:has(b:-soup-contains(GENERATION)):not(:has(table))"
)
generation.tr.extract()
for td in generation.tr.select("td"):
td.name = "th"
return pd.read_html(str(generation))[0]
def get_interchange(soup):
interchange = soup.select_one(
"table:has(b:-soup-contains(INTERCHANGE)):not(:has(table))"
)
interchange.tr.extract()
for td in interchange.tr.select("td"):
td.name = "th"
return pd.read_html(str(interchange))[0]
url = "http://ets.aeso.ca/ets_web/ip/Market/Reports/CSDReportServlet"
soup = BeautifulSoup(requests.get(url).content, "html5lib")
print(get_summary(soup))
print(get_generation(soup))
print(get_interchange(soup))指纹:
0 1
0 Alberta Total Net Generation 9359
1 Net Actual Interchange -343
2 Alberta Internal Load (AIL) 9702
3 Net-To-Grid Generation 6946
4 Contingency Reserve Required 514
5 Dispatched Contingency Reserve (DCR) 552
6 Dispatched Contingency Reserve -Gen 374
7 Dispatched Contingency Reserve -Other 178
8 LSSi Armed Dispatch 78
9 LSSi Offered Volume 82
GROUP MC TNG DCR
0 GAS 10836 6801 79
1 HYDRO 894 270 233
2 ENERGY STORAGE 50 0 50
3 SOLAR 936 303 0
4 WIND 2269 448 0
5 OTHER 424 273 12
6 DUAL FUEL 0 0 0
7 COAL 1266 1264 0
8 TOTAL 16675 9359 374
PATH ACTUAL FLOW
0 British Columbia -230
1 Montana -113
2 Saskatchewan 0
3 TOTAL -343https://stackoverflow.com/questions/72651724
复制相似问题