我有这个代码,它为我刮了几百页。但是,有时a的xpath根本不存在,我如何才能编辑它,这样脚本就不会停止运行,继续运行以获得b,并为特定的页面提供这些信息?
`a = response.xpath("//div[@class='headerDiv']/a/@title").extract()[0]
b = response.xpath("//div[@class='headerDiv']/text()").extract()[0].strip()
items['title'] = a + " " + b
yield items`发布于 2016-10-19 11:43:31
只需检查extract()的结果即可。
nodes = response.xpath("//div[@class='headerDiv']/a/@title").extract()
a = nodes[0] if nodes else ""
nodes = response.xpath("//div[@class='headerDiv']/text()").extract()
b = nodes[0].strip() if nodes else ""
items['title'] = a + " " + b
yield items在Padraic Cunningham的忠告下:
a = response.xpath("//div[@class='headerDiv']/a/@title").extract_first(default='')
b = response.xpath("//div[@class='headerDiv']/text()").extract_first(default ='').strip()
items['title'] = (a + " " + b).strip()
yield items发布于 2016-10-19 12:06:00
您可以如下所示:
import lxml.etree as etree
parser = etree.XMLParser(strip_cdata=False, remove_comments=True)
root = etree.fromstring(data, parser)
#Take Hyperlink as per xpath:
#But Xpath returns list of element so we have to take 0 index of it if it has element
a = root.xpath("//div[@class='headerDiv']/a/@title")
b = response.xpath("//div[@class='headerDiv']/text()")
if a:
items['title'] = a[0].strip() + " " + b[0].strip()
else:
items['title'] = b[0].strip()
yield itemshttps://stackoverflow.com/questions/40130194
复制相似问题