我可以使用以下代码从网页中获得产品列表
from lxml import html
import requests
page = requests.get('http://monument.pl/pol_m_DESKOROLKA_Deski-162.html')
tree = html.fromstring(page.content)
VENDORLISTn = tree.xpath('//a[@class="firm_name"]/text()')
print VENDORLISTn我得到以下结果
['Almost', 'Almost', 'Almost', 'Enjoi', 'Real', 'Boulevard', 'Almost', 'Almost', 'Enjoi', 'Enjoi', 'Enjoi', 'Blind', 'Blind', 'Blind', 'Blind', 'Blind', 'Blind', 'Blind', 'Cliche', 'Blind', 'Blind', 'Blind', 'Enjoi', 'Enjoi', 'Enjoi', 'Enjoi', 'Enjoi', 'Enjoi', 'Enjoi', 'Antihero']如何获得这些元素的路径列表?看起来可能是这样的:
['//*[@id="search"]/table/tbody/tr[1]/td[1]/div/div[3]/div/a','//*[@id="search"]/table/tbody/tr[1]/td[2]/div/div[3]/div/a',etc....发布于 2017-03-06 13:31:44
VENDORLISTn只是一个list of 字符串。我想没有办法为此生成XPath,但是您可以获得每个链接的绝对XPath,如下所示:
from lxml import etree
from lxml import html
import requests
page = requests.get('http://monument.pl/pol_m_DESKOROLKA_Deski-162.html')
tree = html.fromstring(page.content)
VENDORLISTn = tree.xpath('//a[@class="firm_name"]')
for link in VENDORLISTn:
etree.ElementTree(tree).getpath(link)输出:
'/html/body/div[1]/div/div[2]/div/div[2]/div/div[7]/table/tr[1]/td[1]/div/div[3]
/div/a'
'/html/body/div[1]/div/div[2]/div/div[2]/div/div[7]/table/tr[1]/td[2]/div/div[3]
/div/a'
'/html/body/div[1]/div/div[2]/div/div[2]/div/div[7]/table/tr[1]/td[3]/div/div[3]
/div/a'
....https://stackoverflow.com/questions/42626560
复制相似问题