下面是我的HTML文件示例:
<div id="document_64219">
<div id="part_12">
<p class="keywords_HTAG">
<strong> Key Words : </strong>
<ol>
<li>Decentralization</li>
<li>Planning</li>
</ol>
</p>我想用以"part_“开头的id解析每个div,并得到它的<li>
myDivs = soup.findAll("div", id=re.compile("^brique_"))
myParagraph = myDivs.find_all_next("p", {"class" : "keywords_HTAG"})
for div in myDivs :
for paragraph in myParagraph :
li = paragraph.find_all_next("li")
print (li)这不会向我返回任何东西。我习惯了Jsoup,我觉得它足够直观,但我不确定它是如何使用Beautiful Soup完成的。
发布于 2020-04-14 21:46:48
您可以在css选择器中使用与某些属性中的部分值相匹配的*。
from bs4 import BeautifulSoup
html = """
<div id="document_64219">
<div id="part_12">
<p class="keywords_HTAG">
<strong> Key Words : </strong>
<ol>
<li>first</li>
<li>second</li>
</ol>
</p>
<ol>
<li>third</li>
<li>fourth</li>
</ol>
"""
soup = BeautifulSoup(html, 'html.parser')
# if you want the li with in the div with the id contains part_
# This will return first , second , third and fourth
lis = [li.text for li in soup.select('div[id*="part_"] li')]
print('All li ==>',lis)
# if you want olny the li with in the div with the id contains part_ and with in the p tag only
# This will return first , second
lis = [li.text for li in soup.select('div[id*="part_"] p.keywords_HTAG li')]
print('with in p only ==>',lis)
# if you want olny the li with in the div with the id contains part_ and without the li in the p tag
# This will return third and fourth
lis = [li.text for li in soup.select('div[id*="part_"] > ol li')]
print('without li in p ==>',lis)输出:
All li ==> ['first', 'second', 'third', 'fourth']
with in p only ==> ['first', 'second']
without li in p ==> ['third', 'fourth']发布于 2020-04-14 20:42:22
import re
from bs4 import BeautifulSoup
html = """
<div id="document_64219">
<div id="part_12">
<p class="keywords_HTAG">
<strong> Key Words : </strong>
<ol>
<li>Decentralization</li>
<li>Planning</li>
</ol>
</p>
"""
soup = BeautifulSoup(html, 'html.parser')
for item in soup.findAll("div", id=re.compile("^part_")):
print(item.find_all_next("li"))输出:
[<li>Decentralization</li>, <li>Planning</li>]或
for item in soup.findAll("div", id=re.compile("^part_")):
print([item.text for item in item.find_all_next("li")])输出:
['Decentralization', 'Planning']https://stackoverflow.com/questions/61207847
复制相似问题