文章/答案/技术大牛

发布

社区首页 >问答首页 >从shopify站点抓取产品-意外结果

问从shopify站点抓取产品-意外结果
EN

Stack Overflow用户

提问于 2021-01-07 22:49:41

回答 1查看 101关注 0票数 0

一般来说，我对编码是个新手，但对于我的第一个项目，我正在尝试创建一个监视器来监视Shopify站点的产品更改。

我的方法是在网上抓取公开共享的代码，并从那里向后工作来理解它，所以我在一个更广泛的类中获得了以下代码，这个类似乎通过循环遍历页面来获取products.json。

但是当我加载https://www.hanon-shop.com/collections/all/products.json，然后打印下面的项目列表时，前几个产品是不同的，这有什么意义呢？

def scrape_site(self):
        """
        Scrapes the specified Shopify site and adds items to array
        :return: None
        """
        self.items = []
        s = rq.Session()
        page = 1
        while page > 0:
            try:
                html = s.get(self.url + '?page=' + str(page) + '&limit=250', headers=self.headers, proxies=self.proxy, verify=False, timeout=20)
                output = json.loads(html.text)['products']
                if output == []:
                    page = 0
                else:
                    for product in output:
                        product_item = [{'title': product['title'], 'image': product['images'][0]['src'], 'handle': product['handle'], 'variants':product['variants']}]
                        self.items.append(product_item)
                    logging.info(msg='Successfully scraped site')
                    page += 1
            except Exception as e:
                logging.error(e)
                page = 0
            time.sleep(0.5)
        s.close()

python

html

web-scraping

python-requests

shopify

回答 1

Stack Overflow用户

发布于 2021-01-08 00:34:27

Requests接受一个参数字典，并且还有一个json方法，所以这可以更清晰。

import time
import requests


def scrape_site(self):
    self.items = []
    page = 1

    with requests.Session() as s:
        while True:
            params = {
              'page': page,
              'limit': 250
            }
        
            try:
                r = s.get(self.url, params=params, headers=self.headers, proxies=self.proxy, verify=False, timeout=20)
                r.raise_for_status()
                output = r.json()
                if not output:
                    break
                for product in output['products']:
                    product_item = {
                        'title': product['title'], 
                        'image': product['images'][0]['src'], 
                        'handle': product['handle'], 
                        'variants':product['variants']
                    }
                    self.items.append(product_item)
                logging.info(f'Successfully scraped page {page}')
                page += 1
                time.sleep(1)
                
            except Exception as e:
                logging.error(e)
                break

    return self.items

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65614589

复制

相似问题

问从shopify站点抓取产品-意外结果
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从shopify站点抓取产品-意外结果EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从shopify站点抓取产品-意外结果
EN