文章/答案/技术大牛

发布

社区首页 >问答首页 >Scrapy在性能较好的设备上的性能比在较差的设备上要慢，我不明白为什么

问Scrapy在性能较好的设备上的性能比在较差的设备上要慢，我不明白为什么
EN

Stack Overflow用户

提问于 2021-09-05 10:38:04

回答 1查看 28关注 0票数 0

我所说的设备有以下规格：

->desktop PC，英特尔i5 4690，16 i5，以太网连接(约500 gigs下载，约750 gigs上传)

->laptop，英特尔i7-3520M，8 i7内存，wi-fi连接(约100 gigs下载，约120上载)

下载+上传连接速度使用speedtest.net进行了测试，确保请求到达同一台服务器，因此两个设备之间不会不匹配。

我在这两个设备上使用了完全相同的代码。我已经在台式电脑上写好了，然后我把整个项目转移到了笔记本电脑上。

首先，让我向您展示scrapy bench的结果：

laptop (worse)
2021-09-05 12:56:25 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 113852,
 'downloader/request_count': 293,
 'downloader/request_method_count/GET': 293,
 'downloader/response_bytes': 730855,
 'downloader/response_count': 293,
 'downloader/response_status_count/200': 293,
 'elapsed_time_seconds': 10.813298,
 'finish_reason': 'closespider_timeout',
 'finish_time': datetime.datetime(2021, 9, 5, 9, 56, 25, 216524),
 'log_count/INFO': 20,
 'request_depth_max': 13,
 'response_received_count': 293,
 'scheduler/dequeued': 293,
 'scheduler/dequeued/memory': 293,
 'scheduler/enqueued': 5859,
 'scheduler/enqueued/memory': 5859,
 'start_time': datetime.datetime(2021, 9, 5, 9, 56, 14, 403226)}

desktop
2021-09-05 13:13:50 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 216900,
 'downloader/request_count': 503,
 'downloader/request_method_count/GET': 503,
 'downloader/response_bytes': 1467597,
 'downloader/response_count': 503,
 'downloader/response_status_count/200': 503,
 'elapsed_time_seconds': 10.689637,
 'finish_reason': 'closespider_timeout',
 'finish_time': datetime.datetime(2021, 9, 5, 10, 13, 50, 146422),
 'log_count/INFO': 20,
 'request_depth_max': 19,
 'response_received_count': 503,
 'scheduler/dequeued': 503,
 'scheduler/dequeued/memory': 503,
 'scheduler/enqueued': 10060,
 'scheduler/enqueued/memory': 10060,
 'start_time': datetime.datetime(2021, 9, 5, 10, 13, 39, 456785)}
2021-09-05 13:13:50 [scrapy.core.engine] INFO: Spider closed (closespider_timeout)

Scrapy在桌面上有更好的板凳统计数据。

这是我的爬虫，只是为了看看我在使用什么，因为正如我所说的，两个设备上的蜘蛛是完全相同的。

# -*- coding: utf-8 -*-
import scrapy

from scrapy.spiders.init import InitSpider
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from scrapy.selector import Selector
from scrapy_selenium import SeleniumRequest

class ExampleSpider(InitSpider):
    name = 'example'
    
    def init_request(self):
        yield SeleniumRequest(
            url='https://www.ejobs.ro/',
            wait_time=3,
            callback=self.search
        )

        return self.initialized()

    def search(self, response):
        driver = response.meta['driver']
        search_input = driver.find_element_by_xpath("//input[@id='keyword']")
        search_input.send_keys("programator")

        search_input2 = driver.find_element_by_xpath("//input[@id='s2id_autogen1']")
        search_input2.send_keys("bucuresti")
        selectieOras = driver.find_element_by_xpath("//input[@id='s2id_autogen1_search']")
        selectieOras.send_keys(Keys.ENTER)

        submit = driver.find_element_by_xpath("//button[@id='submit']")
        driver.execute_script("arguments[0].click();", submit)

        try:
            element = WebDriverWait(driver, 10).until(
                EC.presence_of_element_located((By.ID, "searchSection"))
            )
        finally:
            yield SeleniumRequest(
                url=driver.current_url,
                wait_time=3,
                callback=self.parse
            )

    def parse(self, response):  
        driver = response.meta['driver'] 
        try:
            element = WebDriverWait(driver, 10).until(
                EC.presence_of_element_located((By.ID, "searchSection"))
            )
        finally:
            html = driver.page_source
            response_obj = Selector(text=html)
            
            links = response_obj.xpath("//div[@class='jobitem-body']")
            for link in links:
                URL = link.xpath(".//a[contains(@class, 'title')]/@href").get()

                if URL:
                    yield SeleniumRequest(
                        url=URL,
                        wait_time=3,
                        callback=self.parse_res
                    )

            # next = response_obj.xpath("//div[@id='searchPagination']/li[@class='next']/a/@href")
            # if next:
            #     hrefLink = next.get()
            #     yield SeleniumRequest(
            #         url=hrefLink,
            #         wait_time=3,
            #         callback=self.parse
            #     )

    def parse_res(self, response):
        yield {
            'title': response.xpath("//h1[@class='jobad-title']/text()").get()
        }

我尝试过的：

-making确保我在两个设备上都安装了相同版本的Scrapy (两个设备都安装了2.5.0)

-making确保我在两台设备上安装了相同版本的Selenium (两者都安装了3.141.0)

-making确保我在两个设备上安装了相同版本的Anaconda3 (两个设备上都安装了2021.05 Python3.8.8 64位)

-making确保我在两台设备上安装了相同版本的Firefox (两台设备都安装了91.0.2)

-making确保我在两个设备上都安装了相同版本的scrapy-selenium (两者都安装了0.0.7)

-I使用相同的geckodriver.exe (因为正如我所说的，我只是将项目从台式机复制并粘贴到笔记本电脑)

在anaconda上-creating一个新的、干净的工作区，刚刚安装了scrapy、selenium和scrapy-selenium

桌面上的-flushing域名系统

在桌面上-deleting所有临时数据和浏览数据

都不起作用。我知道的唯一不同之处是：

-desktop有Python3.7.415.0，而笔记本电脑有3.7.315.0 (我不认为这是问题所在，如果只是告诉我，我会在桌面上降级Python版本)

台式机上的-the操作系统是Windows10教育版N，笔记本电脑上的操作系统是Windows10专业版(我认为这也不是问题)

问题是，在我的笔记本电脑上运行爬虫程序比在PC上运行要快得多。在PC上，每个请求需要~7秒(偶尔6秒)，而在笔记本电脑上最多需要4秒(波动，有时请求需要1秒，有时需要2秒，有时需要3秒，而在PC上每次需要7秒，很少只需要6秒)，我真的不明白为什么。

我已经试过我能想到的所有方法了。会有什么问题呢？

python

selenium

selenium-webdriver

scrapy

回答 1

Stack Overflow用户

发布于 2021-09-05 10:57:45

你的桌面有更多(也许更快)的内存。此外，硬盘速度和后台运行的应用程序也会影响性能。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69062586

复制

相似问题

问Scrapy在性能较好的设备上的性能比在较差的设备上要慢，我不明白为什么
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Scrapy在性能较好的设备上的性能比在较差的设备上要慢，我不明白为什么EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Scrapy在性能较好的设备上的性能比在较差的设备上要慢，我不明白为什么
EN