文章/答案/技术大牛

发布

社区首页 >问答首页 >网页打印的标题为None，BeautifulSoup

问网页打印的标题为None，BeautifulSoup
EN

Stack Overflow用户

提问于 2022-03-22 10:51:57

回答 1查看 82关注 0票数 0

我正试图从本网站中抓取数据，却无法获得网页的标题。

我的密码-

import requests
from bs4 import BeautifulSoup

base_url = "https://www.stfrancismedicalcenter.com/find-a-provider/"

content = requests.get(url = base_url).content
soup = BeautifulSoup(content, "html.parser")

profile_link = soup.find("a", {"class": "flex-top-between-block-500"}).get("href")
profile_url = base_url + profile_link[1:]

profile_content = requests.get(url = profile_url).content
profile_soup = BeautifulSoup(profile_content, "html.parser")
print(profile_soup.title.string)

这是输出的结果。

[Running] python -u "d:\Personal\CS\Web Scrapping\first.py"
None

[Done] exited with code=0 in 3.592 seconds

我想就此提出一些建议！

python

web-scraping

beautifulsoup

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-03-22 11:09:30

这里的问题是，连接到配置文件的路径是不正确的，部分find-a-provider是重复的，因此它变成：

https://www.stfrancismedicalcenter.com/find-a-provider//find-a-provider/adegbenga-a-adetola-md/

相反，使用url定义一个特定的""baseUrl：

profile_url = 'https://www.stfrancismedicalcenter.com' + profile_link

或

baseUrl = 'https://www.stfrancismedicalcenter.com'
profile_url =  baseUrl + profile_link

示例

import requests
from bs4 import BeautifulSoup

url = "https://www.stfrancismedicalcenter.com/find-a-provider"
baseUrl = 'https://www.stfrancismedicalcenter.com'

content = requests.get(url).content
soup = BeautifulSoup(content, "html.parser")

profile_link = soup.find("a", {"class": "flex-top-between-block-500"}).get("href")
profile_url = baseUrl + profile_link

profile_content = requests.get(url = profile_url).content
profile_soup = BeautifulSoup(profile_content, "html.parser")
profile_soup.title.text

输出

Adegbenga A. Adetola MD

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/71570817

复制

相似问题

问网页打印的标题为None，BeautifulSoup
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问网页打印的标题为None，BeautifulSoupEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问网页打印的标题为None，BeautifulSoup
EN