首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何修复刮伤重置连接

如何修复刮伤重置连接
EN

Stack Overflow用户
提问于 2022-07-12 18:59:27
回答 1查看 146关注 0票数 0

在使用Selenium中间件时,尝试用scrapy抓取几个urls时,我总是会出错。

Middleware.py:

代码语言:javascript
复制
class SeleniumMiddleWare(object):

    def __init__(self):
        path = "G:/Downloads/chromedriver.exe"
        options = uc.ChromeOptions()
        options.headless=True
        chrome_prefs = {}
        options.experimental_options["prefs"] = chrome_prefs
        chrome_prefs["profile.default_content_settings"] = {"images": 2}
        chrome_prefs["profile.managed_default_content_settings"] = {"images": 2}
        self.driver=  uc.Chrome(options= options, use_subprocess=True, driver_executable_path = path)
       

    def process_request(self, request, spider):
        try:
            self.driver.get(request.url)
        except:
            pass
        content = self.driver.page_source
        self.driver.quit()

        return HtmlResponse(request.url, encoding='utf-8', body=content, request=request)

    def process_response(self, request, response, spider):
        return response

Spider.py:

代码语言:javascript
复制
class SeleniumSpider(scrapy.Spider):
    name = 'steamdb'
    #allowed_domains = ['steamdb.info']
    start_urls = ['https://steamdb.info/graph/']
    
    def parse(self, response):  
        table = response.xpath('//*[@id="table-apps"]/tbody')
        rows = table.css('tr[class= "app"]')
        #b= a.css('tr [class = "app"]::text')
        #table = b.xpath('//*[@id="table-apps"]/tbody/tr')

        for element in rows:
            link = "https://steamdb.info".join(element.css('::attr(href)').get())
            name = element.css('a ::text')[0].get()
            game_info = {"Link": link, "Name": name}
            yield scrapy.Request(url =link, callback = self.parse_info, cb_kwargs= dict(game_info = game_info))

    
    def parse_info(self, response, game_info ):
        game_info["sales"] = response.xpath('//*[@id="graphs"]/div[5]/div[2]/ul/li[1]/strong/span/text()').getall()
        yield game_info

注意:刮板工作,不使用cb_kwargs提出新的请求,并遵循链接。如果我只在start_urls中刮页,它就能工作,但是当我向其他urls发出新的请求或跟踪页面时,它就不能工作了。

错误:

代码语言:javascript
复制
2022-07-12 20:53:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://steamdb.info/graph/> (referer: https://steamdb.info/graph/)
2022-07-12 20:53:54 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://localhost:52304/session/99578d3d4f168c77b58a85f67be06927/execute/sync {"script": "return navigator.webdriver", "args": []}
2022-07-12 20:53:54 [urllib3.connectionpool] DEBUG: Resetting dropped connection: localhost
2022-07-12 20:53:56 [urllib3.util.retry] DEBUG: Incremented Retry for (url='/session/99578d3d4f168c77b58a85f67be06927/execute/sync'): Retry(total=2, connect=None, read=None, redirect=None, status=None)
2022-07-12 20:53:56 [urllib3.connectionpool] WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000015E5EB66EC0>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it')': /session/99578d3d4f168c77b58a85f67be06927/execute/sync
2022-07-12 20:53:56 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (2): localhost:52304
2022-07-12 20:53:58 [urllib3.util.retry] DEBUG: Incremented Retry for (url='/session/99578d3d4f168c77b58a85f67be06927/execute/sync'): Retry(total=1, connect=None, read=None, redirect=None, status=None)
2022-07-12 20:53:58 [urllib3.connectionpool] WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000015E5ED6C970>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it')': /session/99578d3d4f168c77b58a85f67be06927/execute/sync
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-07-12 19:06:30

the target machine actively refused it意味着服务器响应,但是指定的端口(52304)被关闭。你能检查一下你能不能访问它吗?也许是本地防火墙挡住了它?

UPD:看起来在每个process_request中都调用了process_request,要么重新启动驱动程序,要么在完成之前不要调用.quit()

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/72957179

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档