文章/答案/技术大牛

发布

社区首页 >问答首页 >我在我漂亮的汤包里找不到网页内容

问我在我漂亮的汤包里找不到网页内容
EN

Stack Overflow用户

提问于 2022-04-22 11:26:16

回答 1查看 43关注 0票数 -1

我正在尝试编写一个收集当前在本站上列出的数据集数量的剪贴器。

看看我的密码。

from requests import exceptions
import requests
from bs4 import BeautifulSoup


site='https://data.gov/index.html/'

try:
    html_content=requests.get(site).text

except exceptions.RequestException as e:
    print('there is a problem with reaching this site')

soup=BeautifulSoup(html_content, 'lxml')

    
needed_text=soup.find('label',{'for':'search-header'})

for text in needed_text:
    try:
        final_text=text.find('a').attrs['href']
        print('there are {} data sets currently listed on data.gov'.format(final_text.get_text()))
    except:
        continue

但是，当我运行这段代码时，它不会得到任何结果。

我打印了网站的HTML脚本，找不到我需要的特定数据。我可以在浏览器上看到它，但在我的IDE中找不到它。

请帮帮忙。

python

web-scraping

beautifulsoup

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-04-23 10:25:59

url错误并返回404。你自己看吧。

另外，将代码的soup部分移到try except块可能是个好主意。最后，不需要使用for loop，因为只有一个元素包含您想要的数据。

试试这个：

import requests
from requests import exceptions
from bs4 import BeautifulSoup


site = 'https://data.gov'

try:
    html_content = requests.get(site).text
    soup = BeautifulSoup(html_content, 'lxml')
    needed_text = soup.select_one("small > a[href]").getText()
    print(needed_text)
except exceptions.RequestException as e:
    print('there is a problem with reaching this site')

输出：

335,221 datasets

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/71967985

复制

相似问题

问我在我漂亮的汤包里找不到网页内容
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问我在我漂亮的汤包里找不到网页内容EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问我在我漂亮的汤包里找不到网页内容
EN