文章/答案/技术大牛

发布

社区首页 >问答首页 >收集2页数据后，Web爬虫崩溃

问收集2页数据后，Web爬虫崩溃
EN

Stack Overflow用户

提问于 2020-05-02 22:58:25

回答 1查看 42关注 0票数 0

我正在抓取一个iPhone案例的网站。

网络刮刀应收集产品名称和价格。当我运行这个程序时，我的代码崩溃了，我得到了这个错误：

回溯(最近一次调用)：phonecases.py文件，第12行，在price =NoneType‘’中) AttributeError：'NoneType‘对象没有属性'get_text’

这是因为有些物品在减价，当一件物品不打折时，类是products-grid-price，当一件物品在打折时，类是products-grid-price-sale。因此，程序收集我想要的数据，直到它到达一个正在出售的项目，然后崩溃。

如何修复我的程序，以便它跳过正在出售的项目，或者将它们作为不同的数据点收集？

这是我的密码：

import requests
from bs4 import BeautifulSoup

url = 'https://www.cellphonecases.com/Apple-Iphone-11-C2429.html?page='
    for page in range(1, 5):
        response = requests.get(url + str(page))
        soup = BeautifulSoup(response.text, 'html.parser')
        contents = soup.find_all(class_="products-grid-container-out")

    for content in contents:
        title = content.find(class_="products-gridname").get_text().replace('\n','')
        price = content.find(class_="products-grid-price").get_text().replace('\n','')
        print(title, price)

python

web-scraping

beautifulsoup

web-crawler

回答 1

Stack Overflow用户

发布于 2020-05-18 14:57:08

例如，使用试/除：

price = None
try:
    price = content.find(class_="products-grid-price").get_text().replace('\n','')
except:
    price = content.find(class_="products-grid-price-sale").get_text().replace('\n','')

它将首先尝试代码，如果它抛出异常(错误)，它将捕获它并在except块中运行代码。

或者类似于：

price = None
price_field = content.find(class_="products-grid-price")
if price_field:
    price = price_field.get_text()
else:
    price = content.find(class_="products-grid-price-sale").get_text()

# clean price
price = price.replace('\n','')

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/61567407

复制

相似问题

问收集2页数据后，Web爬虫崩溃
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问收集2页数据后，Web爬虫崩溃EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问收集2页数据后，Web爬虫崩溃
EN