文章/答案/技术大牛

发布

社区首页 >问答首页 >Python html模块:返回子节点内所有出现的标记

问Python html模块:返回子节点内所有出现的标记
EN

Stack Overflow用户

提问于 2020-11-09 02:10:06

回答 1查看 33关注 0票数 1

我将使用requests和lxml模块遍历this page。

from lxml import html
import requests
page_dossier_legislatif = requests.get("http://www.senat.fr/dossier-legislatif/plfss2020.html")
tree = html.fromstring(page_dossier_legislatif.content)

我正在捕捉id = timeline-6和timeline-8的divs，如下所示：

div_dynamic_content = tree.xpath('//*[@id="box-timeline"]/div[2]')[0]
list_div_steps = [div_dynamic_content.xpath("div[@id='timeline-6']"), div_dynamic_content.xpath("div[@id='timeline-8']")]

我想获得每个节点中的链接，但是我不熟悉xpath，所以我找不到正确的xpath查询，我能找到的只是下面的查询，它给了我一个页面中所有链接的列表，而不仅仅是这些div中的链接。

for div_etape_procedure in list_div_steps:
    print(div_etape_procedure.xpath('//a/@href'))
    print"--"

# Desired result should print :
# ['/amendements/2019-2020/98/accueil.html', '/interventions/crisom_plfss2020_1.html',
# '/interventions/criresume_plfss2020_1.html', '/scrupub/dossiers/plfss2020_scr.html#lec1']
# --
# ['/amendements/2019-2020/98/accueil.html', '/interventions/crisom_plfss2020_1.html',
# '/interventions/criresume_plfss2020_1.html', '/scrupub/dossiers/plfss2020_scr.html#lec1',
# '/leg/tas19-026.html']

谢谢。

PS:我不太确定如何表达这个问题，如果你想出一个更好的表达方式，请随意更改标题。

python

xpath

lxml

回答 1

Stack Overflow用户

发布于 2020-11-09 04:11:18

尝试使用此代码从带有@id='timeline-6'和@id='timeline-8'的div中获取所有链接

links = tree.xpath('//div[@id="timeline-6" or @id="timeline-8"]/ul//a/@href')

如果您想将两个div的输出分开：

links_6 = tree.xpath('//div[@id="timeline-6"]/ul//a/@href')
links_8 = tree.xpath('//div[@id="timeline-8"]/ul//a/@href')

票数 -1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/64741484

复制

相似问题

问Python html模块:返回子节点内所有出现的标记
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python html模块:返回子节点内所有出现的标记EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python html模块:返回子节点内所有出现的标记
EN