以下是我想要实现的目标:
class Hello(Spider):
#some stuff
def parse(self, response):
#get a list of url of cities using pickle and store in a list
#Now for each city url I have to get list of monuments (using selenium) which is achieved by the below loops
for c in cities:
#get the list of monuments using selenium and iterate through each monument url contained in the division
divs = sel.xpath('some xpath/div')
for div in divs:
monument_url=''.join(div.xpath('some xpath'))
#For each monument url get the response and scrape the information
yield Request(monument_url, self.parse_monument)
def parse_monument(self, response):
#scrape some information and return to the loop(i.e. return to "for div in divs:") 现在发生的事情是: 1.在执行what语句之前,我得到了所有城市中所有古迹的列表。
有没有办法做到这一点?有没有办法在不转到parse_monument方法的情况下获得请求方法传递给parse_monument的响应对象,这样我就可以使用选择器从响应中选择所需的元素?
谢谢你!!
发布于 2014-07-24 17:03:27
我不认为你可以像你那样回调一个函数。下面是一个重构:
class HelloSpider(scrapy.Spider):
name = "hello"
allowed_domains = ["hello.com"]
start_urls = (
'http://hello.com/cities'
)
def parse(self, response):
cities = ['London','Paris','New-York','Shanghai']
for city in cities:
xpath_exp= 'some xpath[city="' + city + '"]/div/some xpath'
for monument_url in response.xpath(xpath_exp).extract():
yield Request(monument_url, callback=self.parse_monument)
def parse_monument(self,response):
pass发布于 2014-12-27 17:34:39
Request是一个对象,不是一个方法。Scrapy将处理产生的请求对象,并异步执行回调。您可以将请求作为线程对象进行查看。
解决方案是反其道而行之,将所需的数据从parse方法传递给请求,这样就可以在parse_monument中处理它们。
class Hello(Spider):
def parse(self, response):
for c in cities:
divs = sel.xpath('some xpath/div')
for div in divs:
monument_url=''.join(div.xpath('some xpath'))
data = ... # set the data that you need from this loop
# pass the data into request's meta
yield Request(monument_url, self.parse_monument, meta={'data': data})
def parse_monument(self, response):
# retrieve the data from response's meta
data = response.meta.get('data')
...https://stackoverflow.com/questions/24912661
复制相似问题