所以我试着从Postmates上的餐馆抓取食物图片链接。对于我在这家餐厅尝试的示例:https://postmates.com/merchant/fruitive-washington-96807
有图像链接的src似乎对我来说有点难导出它。我尝试了我所知道的一切,但都没有结果。我总是得到这样的返回值:[]、list index out of range error或None type error,以及一般的错误。
该页面的代码如下:
<div id="" class="e1tw3vxs2 css-aktk0j e1qfcze90">
<div>
<img alt="Spring Pesto from Fruitive. Order online." src="https://raster-static.postmates.com/?
url=https%3A%2F%2Fitems-static.postmates.com%2Fuploads%2Fmedia%2F7b289988-5d19-4cfc-80a6- ce88a7a05f41%2Foriginal.jpg%3Fv%3D63784935843&quality=85&w=320&h=0&mode=auto&format=webp&v=4"
class="css-1hyfx7x e1qfcze94">
<div title="Spring Pesto from Fruitive. Order online." class="css-1ggm7mr e1qfcze91"
style="background-image: url("https://raster-static.postmates.com/?url=https%3A%2F%2Fitems-
static.postmates.com%2Fuploads%2Fmedia%2F7b289988-5d19-4cfc-80a6-ce88a7a05f41%2Foriginal.jpg%3Fv%3D63784935843&quality=85&w=320&h=0&mode=auto&
format=webp&v=4"); opacity: 1;"></div>
</div>
<div class="css-f85l49 e1qfcze92"></div>
</div>我的抓取代码是:
header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36"}
page_code = requests.get('https://postmates.com/merchant/fruitive-washington-96807', headers = header)
soup = bs(page_code.content, 'html.parser')
page_code = soup.find_all('div',{'class':'css-135ydxp e1u06svg2'})
for i in page_code:
all_element_products = i.find_all('div',{'class':'product-container css-1kry540 e1tw3vxs3'})
for a_e_p in all_element_products:
try:
img_link = a_e_p.find_all('div', {'class':'e1tw3vxs2 css-aktk0j e1qfcze90'})
except Exception as Err:
print(Err)
print()
img_link = '-'
print(img_link)这里有人有解决方案吗?
发布于 2021-04-23 06:29:37
您在页面上看到的信息是动态呈现的,数据以JSON格式嵌入。您可以使用以下示例,了解如何使用re/json模块加载它:
import re
import json
import requests
url = "https://postmates.com/merchant/fruitive-washington-96807"
html_doc = requests.get(url).text
data = re.search(r"window\.__PRELOADED_STATE__ = ({.*?});", html_doc).group(1)
data = json.loads(data)
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for cat in data["cart"]["categories"]:
for product in cat["products"]:
# print only products with image:
if "img" in product:
print(
"{:<30} {}".format(
product["name"], product["img"]["originalUrl"]
)
)打印:
Loaded Avocado Toast https://items-static.postmates.com/uploads/media/a9f25be8-fd4a-4615-8f50-5f767d76ade9/original.jpg?v=63784935508
Pink Punch https://items-static.postmates.com/uploads/media/1db56bd1-9128-4ee6-837f-63c2db004494/original.jpg?v=63784935614
Tropical Bowl https://items-static.postmates.com/uploads/media/150f8376-ab6d-45e0-b21f-35c648f31814/original.jpg?v=63784935705
Beach Breeze https://items-static.postmates.com/uploads/media/c8bb194b-d12f-4684-a542-76c17310538f/original.jpg?v=63784935751
Spring Pesto https://items-static.postmates.com/uploads/media/7b289988-5d19-4cfc-80a6-ce88a7a05f41/original.jpg?v=63784935843https://stackoverflow.com/questions/67221523
复制相似问题