我的简单的爬虫蜘蛛是低沉的。如何将X-Forwarded-For添加到此crawler?X-Forwarded-For应该适用于将被爬行的所有页面。
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from scrapy.http.request import Request
class MySpider(CrawlSpider):
name = 'spidy'
allowed_domains = ['website.com', 'www.website.com']
start_urls = ['http://www.website.com/']
rules = (
Rule(LinkExtractor(allow=('/uk/', )), callback='parse_item', follow=True),
)
def parse_item(self, response):
print(response.url)附言:我找到了一种通过settings.py实现的方法,但是有没有通过爬虫的方法呢?谢谢!
发布于 2021-10-20 05:20:33
您可以通过使用Rule对象中的process_request函数来实现这一点,如下所示
rules = (Rule(LinkExtractor(allow=('/uk/', )), callback='parse_item', follow=True, process_request='add_header'),)
def add_header(self, request, response):
request.headers['X-Forwarded-For'] = 'the_header_value'
return request有关详细信息,请参阅docs。
https://stackoverflow.com/questions/69571999
复制相似问题