我正试图从以下两页中提取价格信息:
http://jujumarts.com/mobiles-accessories-smartphones-wildfire-sdarkgrey-p-551.html http://jujumarts.com/computers-accessories-transcend-500gb-portable-storejet-25d2-p-2616.html
xpath1 = //span[@class='productSpecialPrice']//text()
xpath2 = //div[@class='proDetPrice']//text()到目前为止,我已经编写了python代码,如果成功,返回xpath1的结果,否则执行第二个。我有一种感觉,仅仅在xpath中实现这个逻辑是可能的,有人能告诉我如何实现吗?
发布于 2013-04-23 12:49:52
使用|表示union
xpath3 = "//span[@class='productSpecialPrice']//text()|//div[@class='proDetPrice']//text()"这并不是你所要求的,但我认为它可以纳入一个可行的解决方案中。
来自XPath (1.0版)规范
\\运算符计算其操作数的联合,该操作数必须是节点集。
例如,
import lxml.html as LH
urls = [
'http://jujumarts.com/mobiles-accessories-smartphones-wildfire-sdarkgrey-p-551.html',
'http://jujumarts.com/computers-accessories-transcend-500gb-portable-storejet-25d2-p-2616.html'
]
xpaths = [
"//span[@class='productSpecialPrice']//text()",
"//div[@class='proDetPrice']//text()",
"//span[@class='productSpecialPrice']//text()|//div[@class='proDetPrice']//text()"
]
for url in urls:
doc = LH.parse(url)
for xpath in xpaths:
print(doc.xpath(xpath))
print收益率
['Rs.11,800.00']
['Rs.13,299.00', 'Rs.11,800.00']
['Rs.13,299.00', 'Rs.11,800.00']
[]
['Rs.7,000.00']
['Rs.7,000.00']另一种获取你想要的信息的方法是
"//*[@class='productSpecialPrice' or @class='proDetPrice']//text()" https://stackoverflow.com/questions/16169603
复制相似问题