我有两个能独立运行的蟒蛇爬行器。
爬行器
它们是我想要运行的分析的一部分,我想将它们全部导入到一个逗号脚本中。
from crawler1.py import *
from crawler2.py import * 我的剧本有点低,我有这样的东西
if <condition1>:
// running crawler1
runCrawler('crawlerName', '/dir1/dir2/')
if <condition2>:
// running crawler2
runCrawler('crawlerName', '/dir1/dir2/')runCrawler是:
def runCrawler(crawlerName, crawlerFileName):
print('Running crawler for ' + crawlerName)
process = CP(
settings={
'FEED_URI' : crawlerFileName,
'FEED_FORMAT': 'csv'
}
)
process.crawl(globals()[crawlerName])
process.start()我得到以下错误:
Exception has occurred: ReactorAlreadyInstalledError
reactor already installed第一个爬行器运行正常。第二个有问题。有什么想法吗?
我通过visual调试器运行上面的代码。
发布于 2022-07-18 21:49:32
最好的办法就是这样做。
你的代码应该是
from twisted.internet import reactor
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
# your code
settings={
'FEED_FORMAT': 'csv'
}
process = CrawlerRunner(Settings)
if condition1:
process.crawl(spider1,crawlerFileName=crawlerFileName)
if condition2:
process.crawl(spider2,crawlerFileName=crawlerFileName)
d = process.join()
d.addBoth(lambda _: reactor.stop())
reactor.run() # it will run both crawlers and code inside the function你的蜘蛛应该像
class spider1(scrapy.Spider):
name = "spider1"
custom_settings = {'FEED_URI' : spider1.crawlerFileName}
def start_requests(self):
yield scrapy.Request('https://scrapy.org/')
def parse(self, response):
passhttps://stackoverflow.com/questions/73016181
复制相似问题