嗨,我在用刮刀刮网站。
我写了蜘蛛,获取了所有的信息,并通过pipeline.py保存到csv文件中。
pipeline.py代码
class Examplepipeline(object):
def __init__(self):
dispatcher.connect(self.spider_opened, signal=signals.spider_opened)
dispatcher.connect(self.spider_closed, signal=signals.spider_closed)
def spider_opened(self, spider):
log.msg("opened spider %s at time %s" % (spider.name,datetime.now().strftime('%H-%M-%S')))
self.exampledotcomCsv = csv.writer(open("csv's/%s(%s).csv"% (spider.name,datetime.now().strftime("%d/%m/%Y,%H-%M-%S")), "wb"),
delimiter=',', quoting=csv.QUOTE_MINIMAL)
self.exampledotcomCsv.writerow(['field1', 'field2','field3','field4'])
def process_item(self, item, spider):
log.msg("Processsing item " + item['title'], level=log.DEBUG)
self.brandCategoryCsv.writerow([item['field1'].encode('utf-8'),
[i.encode('utf-8') for i in item['field2']],
item['field3'].encode('utf-8'),
[i.encode('utf-8') for i in item['field4']]
])
return item
def spider_closed(self, spider):
log.msg("closed spider %s at %s" % (spider.name,datetime.now().strftime('%H-%M-%S')))在上面的代码中,我可以获得start time and end time of spider,但是在关闭蜘蛛之后,我想要计算并显示蜘蛛获取的total time,即difference between start time and end time --那么我如何才能这样做,我们可以用spider_closed方法编写这个功能吗?
请告诉我这件事。
发布于 2012-07-06 11:19:57
为什么不:
def spider_opened(self, spider):
spider.started_on = datetime.now()
...
def spider_closed(self, spider):
work_time = datetime.now() - spider.started_on
...https://stackoverflow.com/questions/11357925
复制相似问题