当我在一个循环中运行多个爬虫时,我想要跟踪有多少个爬虫。我尝试的是使用信号,但我的爬虫似乎找不到它的范围以外的其他模块。我想做的是注册爬行是在另一个脚本中完成的,例如通过传递/更新一个变量。
示例代码(简写版本-解释问题):
Controller.py
isWikipediaDone = False
for file in Spiders:
process.crawl(file)
print(isWikipediaDone)wikipediaSpider.py
class WikipediaSpider(scrapy.Spider):
.... initialize ....
@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
spider = super(wikipediaSpider, cls).from_crawler(crawler, *args, **kwargs)
crawler.signals.connect(spider.spider_closed, signal=signals.spider_closed)
return spider
def spider_closed(self, spider):
print("Now we are done updating variable in Controller.py!")
Controller.isWikipediaDone = True发布于 2017-03-29 16:37:13
您可以创建一个控制器类,然后将其导入到您的蜘蛛中:
# controller.py
class Controller:
def mark_as_done(self, spider):
print("{} is done!".format(spider.name))
controller = Controller()并将控制器方法连接到蜘蛛内部的信号:
# myspider.py
from mypackage.controller import controller
...
crawler.signals.connect(controller.mark_as_done, signals.spider_closed) https://stackoverflow.com/questions/43096194
复制相似问题