文章/答案/技术大牛

发布

社区首页 >问答首页 >Crawlera中间件订单启用httpcache

问Crawlera中间件订单启用httpcache
EN

Stack Overflow用户

提问于 2017-04-23 06:24:05

回答 1查看 165关注 0票数 1

对于已经使用httpcache中间件缓存的页面，我不希望使用crawlera代理服务(因为我对每月的调用次数有限制)。

我正在使用crawlera中间件，并使用以下命令启用它：

DOWNLOADER_MIDDLEWARES = {
'scrapy_crawlera.CrawleraMiddleware': 610}

按照文档(https://scrapy-crawlera.readthedocs.io/en/latest/)中的建议。

不过，在爬行结束后，我得到了：

    2017-04-23 00:14:24 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'crawlera/request': 11,
 'crawlera/request/method/GET': 11,
 'crawlera/response': 11,
 'crawlera/response/status/200': 10,
 'crawlera/response/status/301': 1,
 'downloader/request_bytes': 3324,
 'downloader/request_count': 11,
 'downloader/request_method_count/GET': 11,
 'downloader/response_bytes': 1352925,
 'downloader/response_count': 11,
 'downloader/response_status_count/200': 10,
 'downloader/response_status_count/301': 1,
 'dupefilter/filtered': 6,
 'finish_reason': 'closespider_pagecount',
 'finish_time': datetime.datetime(2017, 4, 22, 22, 14, 24, 839013),
 'httpcache/hit': 11,
 'log_count/DEBUG': 12,
 'log_count/INFO': 9,
 'request_depth_max': 1,
 'response_received_count': 10,
 'scheduler/dequeued': 10,
 'scheduler/dequeued/memory': 10,
 'scheduler/enqueued': 23,
 'scheduler/enqueued/memory': 23,
 'start_time': datetime.datetime(2017, 4, 22, 22, 14, 24, 317893)}
2017-04-23 00:14:24 [scrapy.core.engine] INFO: Spider closed (closespider_pagecount)

使用

downloader/request_count': 11
crawlera/request/method/GET': 11
httpcache/hit': 11

所以我不确定这个调用是否通过crawlera代理服务。当我将crawlera中间件的顺序更改为901,749,751时，我得到了相同的结果。

有人知道引擎盖下面是怎么回事吗？这些页面是否直接从http缓存返回，而不调用crawlera服务器？

谢谢!

scrapy

回答 1

Stack Overflow用户

发布于 2019-03-25 15:18:25

把这个数字看作是对其他中间件的引用。

'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': 600,
'rotating_proxies.middlewares.RotatingProxyMiddleware': 610,
'rotating_proxies.middlewares.BanDetectionMiddleware': 620

只要确保httpcache.HttpCacheMiddleware的数量低于代理中间件即可。

这对我来说很好。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/43565295

复制

相似问题

问Crawlera中间件订单启用httpcache
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Crawlera中间件订单启用httpcacheEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Crawlera中间件订单启用httpcache
EN