首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >根据“美丽汤”之前的元素获取一个HTML元素(EPA网站)

根据“美丽汤”之前的元素获取一个HTML元素(EPA网站)
EN

Stack Overflow用户
提问于 2018-10-29 16:44:26
回答 2查看 36关注 0票数 2

我想打印环境保护局解决方案的“民事处罚”部分,如https://www.epa.gov/enforcement/chevron-settlement-information-sheethttps://www.epa.gov/enforcement/ngl-crude-logistics-llc-clean-air-act-settlement

从以下HTML源中删除

代码语言:javascript
复制
<h2 id="civil">Civil Penalty</h2>
<p>Chevron U.S.A. will pay a $2.95 million civil penalty, of which $2,492,750 will be paid to the United States and $457,250 to the State of Mississippi.</p>

我想让雪佛龙美国公司支付295万美元的民事罚款。

这一结构适用于所有定居点实况报告。

代码语言:javascript
复制
<h2 id="civil">Civil Penalty</h2>
<p>NGL will pay a civil penalty of $25 million. The penalty is based, in part, on the company’s limited ability to pay a larger penalty.</p>

我发现了与在有漂亮汤的字符串之前获得一个元素相似的地方--但这与我的问题并不完全相同。

下面是我的代码框架:

代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup
import sys

for i in ['chevron-settlement-information-sheet', 'ngl-crude-logistics-llc-clean-air-act-settlement', 'derive-systems-clean-air-act-settlement']:

    page = requests.get("https://www.epa.gov/enforcement/"+i)
    soup = BeautifulSoup(page.content, 'html.parser')

    data = []

    for result in soup.find_all('h2', id='civil'):
        data.append(result)

print(data)

如何直接在<p>之后打印<h2 id="civil">部分

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-10-29 17:09:43

您可能没有得到您正在寻找的结果的一个原因是您将/history添加到URL中,这将导致一个404错误页。如果删除该部分,然后使用findNext('p')<h2 id="civil">元素之后获取页面上的下一个段落元素,您将得到预期的结果:

代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup

for url in ['chevron-settlement-information-sheet', 'ngl-crude-logistics-llc-clean-air-act-settlement', 'derive-systems-clean-air-act-settlement']:

    page = requests.get("https://www.epa.gov/enforcement/" + url)
    soup = BeautifulSoup(page.content, 'html.parser')

    result = soup.find('h2', {'id': 'civil'}).findNext('p')
    print(result.text)

这张打印出来:

代码语言:javascript
复制
Chevron U.S.A. will pay a $2.95 million civil penalty, of which $2,492,750 will be paid to the United States and $457,250 to the State of Mississippi.
NGL will pay a civil penalty of $25 million. The penalty is based, in part, on the company’s limited ability to pay a larger penalty.
Derive will pay a civil penalty of $300,000, as the company has limited financial ability to pay a higher penalty. 
票数 0
EN

Stack Overflow用户

发布于 2018-10-29 17:06:38

您可以尝试兄弟姐妹选择器,+

代码语言:javascript
复制
p=soup.select('#civil + p')
print(p[0].getText())

这将只选择p元素,它是#civil元素的下一个同级。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/53050160

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档