文章/答案/技术大牛

发布

社区首页 >问答首页 >BeautifulSoup未获取完整的映像地址

问BeautifulSoup未获取完整的映像地址
EN

Stack Overflow用户

提问于 2021-04-07 20:40:58

回答 2查看 33关注 0票数 0

我正在使用漂亮的汤从网站上抓取图像，但是我的代码没有返回在检查网页时可见的图像的完整地址。

for b in soup.select(".thumb_div.clear a"):
            imagelink = a["href"].replace("/mushrooms/", "http://www.foragingguide.com/mushrooms/")
            print(imagelink)

应该返回：http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/87.jpg，因为源代码是：

<a href="http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/87.jpg" rel="lightbox[photos]" title="Amethyst Deceiver (Laccaria amethystina)">

而是只返回http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/而不结束jpg文件，这是正常工作所必需的。

有人知道这是为什么吗？谢谢。

python

html

web

web-scraping

beautifulsoup

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-04-07 21:38:19

简单的解决方案“

for b in soup.select(".thumb_div a"):
            imagelink = b["href"]
            print(imagelink)

原来“href”中的"a“与之无关，它是"a”这个可迭代变量，它并不存在。将代码更改为b"href“可以工作。

票数 -1

Stack Overflow用户

发布于 2021-04-07 21:00:09

您不需要进行替换，只需直接针对图像源即可。

例如：

import requests
from bs4 import BeautifulSoup


end_point = "http://www.foragingguide.com/mushrooms/sp/amethyst_deceiver"
response = requests.get(end_point).text
soup = BeautifulSoup(response, "lxml").select(".thumb_div a")
print("\n".join(i["href"] for i in soup))

输出：

http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/87.jpg
http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/88.jpg
http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/90.jpg
http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/91.jpg

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/66986262

复制

相似问题

问BeautifulSoup未获取完整的映像地址
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问BeautifulSoup未获取完整的映像地址EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问BeautifulSoup未获取完整的映像地址
EN