文章/答案/技术大牛

发布

社区首页 >问答首页 >在html文档中获取特定的p标记

问在html文档中获取特定的p标记
EN

Stack Overflow用户

提问于 2020-12-03 22:26:05

回答 1查看 50关注 0票数 0

我有一个解析HTML页面的代码。

from bs4 import BeautifulSoup

with open('Books-_html.txt') as page:
   soup = BeautifulSoup(page, "lxml")

Items = soup.find('div',{'class':'main'})

All_links_and_titles = Items.findAll('p')

print(All_links_and_titles)

打印后留下的html如下：

[<p>Looking for good philosophy books? This is my list of the best philosophy books of all-time. If you only have time to read one or two books, I recommend looking at the Top Philosophy Books section below.</p>, <p>Further down the page, you'll find more philosophy book recommendations. Many of these books are fantastic as well. I try to carefully curate all of my reading lists and you can be sure that any philosophy book on this page is worth your time. Enjoy!</p>, <p><strong>Manual for Living<br/></strong>by Epictetus<br/><a href="https://jamesclear.com/book/manual-for-living">Print</a> | <a href="https://jamesclear.com/audiobook/manual-for-living">Audiobook</a><br/><a href="https://jamesclear.com/book-summaries/manual-for-living">Read my summary of this book Â»</a></p>, <p><strong>Meditations</strong><br/>by Marcus Aurelius<br/><a href="https://jamesclear.com/book/meditations">Print</a> | <a href="https://jamesclear.com/ebook/meditations">eBook</a> | <a href="https://jamesclear.com/audiobook/meditations">Audiobook</a></p>, <p><strong>The Republic</strong><br/>by Plato<br/><a href="https://jamesclear.com/book/the-republic">Print</a> | <a href="https://jamesclear.com/ebook/the-republic">eBook</a> | <a href="https://jamesclear.com/audiobook/the-republic">Audiobook</a></p>, <p><strong>The Little Prince</strong><br/>by Antoine de Saint-Exupery<br/><a href="https://jamesclear.com/book/the-little-prince" title="The Little Prince by Antoine de Saint-Exupery">Print</a> | <a href="https://jamesclear.com/audiobook/the-little-prince" title="The Little Prince by Antoine de Saint-Exupery">Audiobook</a></p>, <p><strong>Free Will</strong><br/>by Sam Harris<br/><a href="https://jamesclear.com/book/free-will">Print</a> | <a href="https://jamesclear.com/ebook/free-will">eBook</a> | <a href="https://jamesclear.com/audiobook/free-will">Audiobook</a><br/><a href="https://jamesclear.com/book-summaries/free-will">Read my summary of this book Â»</a></p>, <p><strong>Candide</strong><br/>by Voltaire<br/><a href="https://jamesclear.com/book/candide" title="Candide by Voltaire">Print</a> | <a href="https://jamesclear.com/audiobook/candide" title="Candide audiobook">Audiobook</a></p>, <p>Or, <a href="https://jamesclear.com/best-books" title="Browse all book recommendations.">browse all book recommendations</a>.</p>, <p>]

从里面我需要得到带有书名的p标签。例如冥想，小王子等等。

 <p><strong>Meditations</strong><br


<p><strong>The Little Prince</strong><br/

打印后的代码(All_links_and_titles)应该如下所示：

for Only_titles in All_links_and_titles:
    Only_titles = All_links_and_titles.find( ????)
    print(Only_titles)

到目前为止什么都不起作用。需要帮助。提前谢谢你。

beautifulsoup

python

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-12-03 22:41:15

尝试使用CSS选择器p strong，它选择<p>标记下的所有<strong>标记。

from bs4 import BeautifulSoup

html = """<p>Looking for good philosophy books? [And on..]">browse all book recommendations</a>.</p>, <p>"""
soup = BeautifulSoup(html, "html.parser")

for tag in soup.select("p strong"):
    print(tag.text)

输出：

Manual for Living
Meditations
The Republic
The Little Prince
Free Will
Candide

在你的例子中：

for tag in all_links_and_titles:
    title = tag.select_one("p strong")
    # Were only calling the `.text` method if it's not None
    if title:
        print(title.text)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65135108

复制

相似问题

问在html文档中获取特定的p标记
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在html文档中获取特定的p标记EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在html文档中获取特定的p标记
EN