我正在使用漂亮的汤从网站上抓取图像,但是我的代码没有返回在检查网页时可见的图像的完整地址。
for b in soup.select(".thumb_div.clear a"):
imagelink = a["href"].replace("/mushrooms/", "http://www.foragingguide.com/mushrooms/")
print(imagelink)应该返回:http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/87.jpg,因为源代码是:
<a href="http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/87.jpg" rel="lightbox[photos]" title="Amethyst Deceiver (Laccaria amethystina)">而是只返回http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/而不结束jpg文件,这是正常工作所必需的。
有人知道这是为什么吗?谢谢。
发布于 2021-04-07 21:38:19
简单的解决方案“
for b in soup.select(".thumb_div a"):
imagelink = b["href"]
print(imagelink)原来“href”中的"a“与之无关,它是"a”这个可迭代变量,它并不存在。将代码更改为b"href“可以工作。
发布于 2021-04-07 21:00:09
您不需要进行替换,只需直接针对图像源即可。
例如:
import requests
from bs4 import BeautifulSoup
end_point = "http://www.foragingguide.com/mushrooms/sp/amethyst_deceiver"
response = requests.get(end_point).text
soup = BeautifulSoup(response, "lxml").select(".thumb_div a")
print("\n".join(i["href"] for i in soup))输出:
http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/87.jpg
http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/88.jpg
http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/90.jpg
http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/91.jpghttps://stackoverflow.com/questions/66986262
复制相似问题