问使用python web抓取获取UnboundLocalError
EN

Stack Overflow用户

提问于 2020-04-01 01:37:50

回答 1查看 114关注 0票数 0

我写了一个亚马逊网络抓取代码，我的代码是工作良好的所有亚马逊产品，但它确实为一些特定的产品，我不知道，它显示的错误是UnboundLocalError的一些产品:本地变量‘评论’引用赋值前你能指导我怎么解决这个问题，谢谢！下面是代码：

import requests 
from bs4 import BeautifulSoup as bs 
import json 
import random 
import os.path
import time
import pandas as pd

def scrape_products(response):

    dataframe = pd.DataFrame()
    res = pd.DataFrame()

    # Checking if response is okay or not
    if response.ok:
        response = response.text
        content = bs(response,'lxml')

        # Selecting products
        items = content.find_all('li' , class_ = 'zg-item-immersion')

        for item in items:            
            # Selecting data about each product
            count = item.find('span' , class_ = 'zg-badge-text').text.strip()
            title = item.find('div' , class_ = 'p13n-sc-truncate').text.strip()
            price = item.find('span' , class_ = 'p13n-sc-price')
            try:
                rating = item.find('span' , class_ = 'a-icon-alt').text.strip()
                total = item.find('a' , class_ = 'a-size-small a-link-normal').text.strip()
                reviews = item.find('div' , class_ = 'a-icon-row a-spacing-none').find('a',class_='a-link-normal').get('href')
            except:
                pass

            image_url = item.find('div' , class_ = 'a-section a-spacing-small').find('img').get('src')

            go_to = item.find('span', class_ = 'a-list-item').find('a' , class_ = 'a-link-normal').get('href')
            product_url = 'https://www.amazon.com' + go_to
    #        product_url = product_url.replace('?','/ref=zg_bs_2399939011_1?')



            reviews_url = 'https://www.amazon.com' + reviews

    #         desc = requests.get(product_url)

    #         print(desc.status_code)
    #         if desc.ok:
    #             desc = desc.text
    #             data = bs(desc,'lxml')

    #             old_price = data.find('span',class_='priceBlockStrikePriceString a-text-strike').text

    #             if(old_price):
    #                 print(old_price)
    #             else:
    #                 pass

            print('************************************************ ' + count +' **********************************************')
            print('Title: {}'.format(title + '\n'))

            if(price):
                price = price.text.strip()
                price = price[1:]
                print('Price: {}'.format(price))
            else:
                pass

            print('Rating: {} ({})'.format(rating , total))
            print('Reviews Url: {}'.format(reviews_url))
            print('Image Url: {}'.format(image_url))
            print('Product Url: {}'.format(product_url))


            print()
            print()

            data = {'Title':[title], 'Price':[price], 'Rating':[str(rating) +'('+ str(total)+')'],
                    'Reviews Url':[reviews_url], 'Image Url': [image_url], 'Product Url':[product_url]}

            df = pd.DataFrame(data)

            dataframe = dataframe.append(df)
    return dataframe



def main():

    page_1 = 'https://www.amazon.com/Best-Sellers-Amazon-Device-Smart-Locks/zgbs/amazon-devices/17295887011/ref=zg_bs_nav_3_5499877011'
    #page_2 = 'https://www.amazon.com/Best-Sellers-Fire-Tablets-Bundles/zgbs/amazon-devices/17142718011/ref=zg_bs_pg_1?_encoding=UTF8&pg=2'

    response_1 = requests.get(page_1)
    #response_2 = requests.get(page_2)


    df1 = scrape_products(response_1)
    #df2 = scrape_products(response_2)

    #df = df1.append(df2)
    df1.Price = df1.Price.astype(float)
    df1 = df1.sort_values(by=['Price'])
    df1.to_csv('AMAZON\Amazon Devices & Accessories\Amazon Device Accessories\Best Sellers in Amazon Device Smart Locks.csv',index=False)


if __name__== "__main__":
  main()

beautifulsoup

python

web-scraping

回答 1

Stack Overflow用户

发布于 2020-04-01 01:49:08

您可以在try/except块中设置reviews，该块将忽略错误...并且不设置reviews。在这种情况下，reviews_url = 'https://www.amazon.com' + reviews在赋值之前引用reviews。更糟糕的是，rating、total和reviews可能是前一个循环中的陈旧数据，它们会默默地输出坏数据。您需要一个用于处理错误的策略，例如跳过该项

        try:
            rating = item.find('span' , class_ = 'a-icon-alt').text.strip()
            total = item.find('a' , class_ = 'a-size-small a-link-normal').text.strip()
            reviews = item.find('div' , class_ = 'a-icon-row a-spacing-none').find('a',class_='a-link-normal').get('href')
        except:
            print("ERROR scanning {}, ignored".format(item))
            import traceback
            traceback.print_exc()
            continue

捕获裸体异常不是一个好主意。它屏蔽了bug和预期的错误。当您打印出错误时，您会感觉到可以忽略哪些内容，并将异常处理程序更新为

        except (IndexError, ValueError) as e:
            ....

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/60956639

复制

相似问题

问使用python web抓取获取UnboundLocalError
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用python web抓取获取UnboundLocalErrorEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用python web抓取获取UnboundLocalError
EN