这是我的shell脚本
#!/bin/bash
crawlers(){
nohup scrapy crawl a &
nohup scrapy crawl b &
wait $!
nohup scrapy crawl f &
nohup scrapy crawl g &
wait $!
nohup scrapy crawl h &
nohup scrapy crawl i &
wait $!
nohup scrapy crawl i &
nohup scrapy crawl j &
nohup scrapy crawl k &
wait $!
nohup scrapy crawl l &
nohup scrapy crawl m &
}
PATH=$PATH:/usr/local/bin
export PATH
python add_columns.py &
wait $!
crawlers &
wait $!
python final_script.py &我想先运行 add_columns.py脚本
然后是爬虫脚本(爬虫中的所有脚本都是异步的)
最后要运行final_script.py
但是使用上面的shell脚本
final_script.py在结束前正在执行
nohup scrapy crawl l &
nohup scrapy crawl m &虽然我让他们等着
crawlers &
wait $!最后,如何在爬虫()方法中完成所有作业之后才能调用final_script.py。
谢谢
发布于 2012-11-03 18:51:44
首先,为什么要费心去做那些你马上就会wait的事情呢?
其次,在crawlers函数中,您只对一半的调用执行wait;另一半可能仍在运行。
使用没有参数的wait等待所有当前活动的子级退出。这是一个更好的版本:
#!/bin/bash
crawlers(){
nohup scrapy crawl a &
nohup scrapy crawl b &
nohup scrapy crawl f &
nohup scrapy crawl g &
nohup scrapy crawl h &
nohup scrapy crawl i &
nohup scrapy crawl i &
nohup scrapy crawl j &
nohup scrapy crawl k &
nohup scrapy crawl l &
nohup scrapy crawl m &
wait
}
PATH=$PATH:/usr/local/bin
export PATH
python add_columns.py
crawlers
python final_script.pyhttps://stackoverflow.com/questions/13208290
复制相似问题