我正在进行一个数据抓取项目,我的代码使用了Scrapy (版本1.0.4)和Selenium (version 2.47.1)。
from scrapy import Spider
from scrapy.selector import Selector
from scrapy.http import Request
from scrapy.spiders import CrawlSpider
from selenium import webdriver
class TradesySpider(CrawlSpider):
name = 'tradesy'
start_urls = ['My Start url',]
def __init__(self):
self.driver = webdriver.Firefox()
def parse(self, response):
self.driver.get(response.url)
while True:
tradesy_urls = Selector(response).xpath('//div[@id="right-panel"]"]')
data_urls = tradesy_urls.xpath('div[@class="item streamline"]/a/@href').extract()
for link in data_urls:
url = 'My base url'+link
yield Request(url=url,callback=self.parse_data)
time.sleep(10)
try:
data_path = self.driver.find_element_by_xpath('//*[@id="page-next"]')
except:
break
data_path.click()
time.sleep(10)
def parse_data(self,response):
'Scrapy Operations...'当我执行我的代码时,我得到了一些urls的预期输出,但是对于另一些urls,我得到了以下错误。
2016-01-19 15:45:17 [scrapy] DEBUG: Retrying <GET MY_URL> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'SSL3_READ_BYTES', 'ssl handshake failure')]>]请提供此查询的解决方案。
发布于 2016-01-21 16:17:24
根据这个报告问题,您可以创建自己的ContextFactory来处理SSL。
context.py:
from OpenSSL import SSL
from scrapy.core.downloader.contextfactory import ScrapyClientContextFactory
class CustomContextFactory(ScrapyClientContextFactory):
"""
Custom context factory that allows SSL negotiation.
"""
def __init__(self):
# Use SSLv23_METHOD so we can use protocol negotiation
self.method = SSL.SSLv23_METHODsettings.py
DOWNLOADER_CLIENTCONTEXTFACTORY = 'yourproject.context.CustomContextFactory'发布于 2020-10-23 16:52:44
使用Scrapy 1.5.0,我遇到了以下错误:
Error downloading: https://my.website.com>: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'tls12_check_peer_sigalg', 'wrong curve')]>]最终起作用的是更新我的Twisted版本(从17.9.0 -> 19.10.0)。我还将Scrapy更新为2.4.0,还有其他几个:
https://stackoverflow.com/questions/34875175
复制相似问题