首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用Beautiful Soup查找具有部分id值的标签

使用Beautiful Soup查找具有部分id值的标签
EN

Stack Overflow用户
提问于 2020-04-14 20:36:42
回答 2查看 139关注 0票数 0

下面是我的HTML文件示例:

代码语言:javascript
复制
<div id="document_64219">
    <div id="part_12">
    <p class="keywords_HTAG">
        <strong> Key Words : </strong>
        <ol>
            <li>Decentralization</li>
            <li>Planning</li>
        </ol>
    </p>

我想用以"part_“开头的id解析每个div,并得到它的<li>

代码语言:javascript
复制
myDivs = soup.findAll("div", id=re.compile("^brique_"))
myParagraph = myDivs.find_all_next("p", {"class" : "keywords_HTAG"})

for div in myDivs :
   for paragraph in myParagraph :
        li = paragraph.find_all_next("li")
        print (li)

这不会向我返回任何东西。我习惯了Jsoup,我觉得它足够直观,但我不确定它是如何使用Beautiful Soup完成的。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-04-14 21:46:48

您可以在css选择器中使用与某些属性中的部分值相匹配的*

代码语言:javascript
复制
from bs4 import BeautifulSoup

html = """
<div id="document_64219">
    <div id="part_12">
    <p class="keywords_HTAG">
        <strong> Key Words : </strong>
        <ol>
            <li>first</li>
            <li>second</li>
        </ol>
    </p>
    <ol>
        <li>third</li>
        <li>fourth</li>
    </ol>
"""

soup = BeautifulSoup(html, 'html.parser')

# if you want the li with in the div with the id contains part_
# This will return first , second , third and fourth 
lis = [li.text for li in soup.select('div[id*="part_"] li')]
print('All li ==>',lis)


# if you want olny the li with in the div with the id contains part_ and with in the p tag only 
# This will return first , second
lis = [li.text for li in soup.select('div[id*="part_"] p.keywords_HTAG li')]
print('with in p only ==>',lis)


# if you want olny the li with in the div with the id contains part_ and without the li in the p tag  
# This will return third and fourth 
lis = [li.text for li in soup.select('div[id*="part_"] > ol  li')]
print('without li in p ==>',lis)

输出:

代码语言:javascript
复制
All li ==> ['first', 'second', 'third', 'fourth']
with in p only ==> ['first', 'second']
without li in p ==> ['third', 'fourth']
票数 1
EN

Stack Overflow用户

发布于 2020-04-14 20:42:22

代码语言:javascript
复制
import re
from bs4 import BeautifulSoup

html = """
<div id="document_64219">
    <div id="part_12">
    <p class="keywords_HTAG">
        <strong> Key Words : </strong>
        <ol>
            <li>Decentralization</li>
            <li>Planning</li>
        </ol>
    </p>
"""

soup = BeautifulSoup(html, 'html.parser')


for item in soup.findAll("div", id=re.compile("^part_")):
    print(item.find_all_next("li"))

输出:

代码语言:javascript
复制
[<li>Decentralization</li>, <li>Planning</li>]

代码语言:javascript
复制
for item in soup.findAll("div", id=re.compile("^part_")):
    print([item.text for item in item.find_all_next("li")])

输出:

代码语言:javascript
复制
['Decentralization', 'Planning']
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61207847

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档