文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在scrapy中使用yield时返回到调用解析函数？

问如何在scrapy中使用yield时返回到调用解析函数？
EN

Stack Overflow用户

提问于 2014-07-23 22:04:36

回答 2查看 2.4K关注 0票数 0

以下是我想要实现的目标：

class Hello(Spider):
    #some stuff
    def parse(self, response):
        #get a list of url of cities using pickle and store in a list
        #Now for each city url I have to get list of monuments (using selenium) which is achieved by the below loops
        for c in cities:
            #get the list of monuments using selenium and iterate through each monument url contained in the division
            divs = sel.xpath('some xpath/div')
            for div in divs:
               monument_url=''.join(div.xpath('some xpath'))
               #For each monument url get the response and scrape the information
               yield Request(monument_url, self.parse_monument)
    def parse_monument(self, response):
        #scrape some information and return to the loop(i.e. return to "for div in divs:")

现在发生的事情是: 1.在执行what语句之前，我得到了所有城市中所有古迹的列表。

每当执行that语句时，它都会转到parse_monument函数，而不会返回到循环，只会抓取第一个城市中存在的古迹列表。

有没有办法做到这一点？有没有办法在不转到parse_monument方法的情况下获得请求方法传递给parse_monument的响应对象，这样我就可以使用选择器从响应中选择所需的元素？

谢谢你!！

scrapy

yield

python

回答 2

Stack Overflow用户

发布于 2014-07-24 17:03:27

我不认为你可以像你那样回调一个函数。下面是一个重构：

class HelloSpider(scrapy.Spider):
    name = "hello"
    allowed_domains = ["hello.com"]
    start_urls = (
        'http://hello.com/cities'
    )

    def parse(self, response):
        cities = ['London','Paris','New-York','Shanghai']
        for city in cities:
            xpath_exp= 'some xpath[city="' + city + '"]/div/some xpath'
            for monument_url in response.xpath(xpath_exp).extract():
                yield Request(monument_url, callback=self.parse_monument)

    def parse_monument(self,response):
        pass

票数 0

Stack Overflow用户

发布于 2014-12-27 17:34:39

Request是一个对象，不是一个方法。Scrapy将处理产生的请求对象，并异步执行回调。您可以将请求作为线程对象进行查看。

解决方案是反其道而行之，将所需的数据从parse方法传递给请求，这样就可以在parse_monument中处理它们。

class Hello(Spider):

    def parse(self, response):
        for c in cities:
            divs = sel.xpath('some xpath/div')
            for div in divs:
               monument_url=''.join(div.xpath('some xpath'))

               data = ...   # set the data that you need from this loop

               # pass the data into request's meta
               yield Request(monument_url, self.parse_monument, meta={'data': data})

    def parse_monument(self, response):
        # retrieve the data from response's meta
        data = response.meta.get('data')
        ...

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/24912661

复制

相似问题

问如何在scrapy中使用yield时返回到调用解析函数？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在scrapy中使用yield时返回到调用解析函数？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在scrapy中使用yield时返回到调用解析函数？
EN