首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何修复scrapy spider的“PROXIES is error”错误

如何修复scrapy spider的“PROXIES is error”错误
EN

Stack Overflow用户
提问于 2019-02-15 05:38:41
回答 1查看 561关注 0票数 1

我试图通过使用代理来运行一个scrapy spider,但每次运行代码时都会遇到错误。

这是针对Mac,python 3.7,scrapy 1.5.1的。我已经尝试了设置和中间件,但没有效果。

代码语言:javascript
复制
class superSpider(scrapy.Spider):
    name = "myspider"

    def start_requests(self):
        print('request')
        urls = [
            'http://quotes.toscrape.com/page/1/',
            'http://quotes.toscrape.com/page/2/',
        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        print('parse')

我得到的错误是:

代码语言:javascript
复制
2019-02-15 08:32:27 [scrapy.utils.log] INFO: Scrapy 1.5.1 started 
(bot: superScraper)
2019-02-15 08:32:27 [scrapy.utils.log] INFO: Versions: lxml 
4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, 
Twisted 18.9.0, Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 
03:13:28) - [Clang 6.0 (clang-600.0.57)], pyOpenSSL 18.0.0 (OpenSSL 
1.1.0j  20 Nov 2018), cryptography 2.4.2, Platform Darwin-17.7.0- 
x86_64-i386-64bit
2019-02-15 08:32:27 [scrapy.crawler] INFO: Overridden settings: 
{'BOT_NAME': 'superScraper', 'CONCURRENT_REQUESTS': 25, 
'NEWSPIDER_MODULE': 'superScraper.spiders', 'RETRY_HTTP_CODES':         
 [500, 503, 504, 400, 403, 404, 408], 'RETRY_TIMES': 10,     
'SPIDER_MODULES': ['superScraper.spiders'], 'USER_AGENT': 
'Mozilla/5.0 (compatible; bingbot/2.0; 
+http://www.bing.com/bingbot.htm)'}
2019-02-15 08:32:27 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats']
Unhandled error in Deferred:
2019-02-15 08:32:27 [twisted] CRITICAL: Unhandled error in Deferred:

Traceback (most recent call last):
File 

"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/crawler.py", line 171, in crawl return self._crawl(crawler, *args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/crawler.py", line 175, in _crawl d = crawler.crawl(*args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/twisted/internet/defer.py", line 1613, in unwindGenerator return _cancellableInlineCallbacks(gen) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks _inlineCallbacks(None, g, status) --- <exception caught here> --- File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks result = g.send(result) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/crawler.py", line 80, in crawl self.engine = self._create_engine() File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/crawler.py", line 105, in _create_engine return ExecutionEngine(self, lambda _: self.stop()) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/core/engine.py", line 69, in __init__ self.downloader = downloader_cls(crawler) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/core/downloader/__init__.py", line 88, in __init__ self.middleware = DownloaderMiddlewareManager.from_crawler(crawler) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/middleware.py", line 58, in from_crawler return cls.from_settings(crawler.settings, crawler) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/middleware.py", line 36, in from_settings mw = mwcls.from_crawler(crawler) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy_proxies/randomproxy.py", line 99, in from_crawler return cls(crawler.settings) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy_proxies/randomproxy.py", line 74, in __init__ raise KeyError('PROXIES is empty') builtins.KeyError: 'PROXIES is empty'

这些网站来自scrapy的文档,它不使用代理就可以工作。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-02-15 08:21:52

对于其他有类似问题的人来说,这是我的实际scrapy_proxies.RandomProxy代码的问题

使用这里的代码可以让它工作:https://github.com/aivarsk/scrapy-proxies

进入scrapy_proxies文件夹,将RandomProxy.py代码替换为github上的代码

我的在这里找到: /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy_proxies/randomproxy.py

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/54699460

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档