首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在html文档中获取特定的p标记

在html文档中获取特定的p标记
EN

Stack Overflow用户
提问于 2020-12-03 22:26:05
回答 1查看 50关注 0票数 0

我有一个解析HTML页面的代码。

代码语言:javascript
复制
from bs4 import BeautifulSoup

with open('Books-_html.txt') as page:
   soup = BeautifulSoup(page, "lxml")

Items = soup.find('div',{'class':'main'})

All_links_and_titles = Items.findAll('p')

print(All_links_and_titles)

打印后留下的html如下:

代码语言:javascript
复制
[<p>Looking for good philosophy books? This is my list of the best philosophy books of all-time. If you only have time to read one or two books, I recommend looking at the Top Philosophy Books section below.</p>, <p>Further down the page, you'll find more philosophy book recommendations. Many of these books are fantastic as well. I try to carefully curate all of my reading lists and you can be sure that any philosophy book on this page is worth your time. Enjoy!</p>, <p><strong>Manual for Living<br/></strong>by Epictetus<br/><a href="https://jamesclear.com/book/manual-for-living">Print</a> | <a href="https://jamesclear.com/audiobook/manual-for-living">Audiobook</a><br/><a href="https://jamesclear.com/book-summaries/manual-for-living">Read my summary of this book »</a></p>, <p><strong>Meditations</strong><br/>by Marcus Aurelius<br/><a href="https://jamesclear.com/book/meditations">Print</a> | <a href="https://jamesclear.com/ebook/meditations">eBook</a> | <a href="https://jamesclear.com/audiobook/meditations">Audiobook</a></p>, <p><strong>The Republic</strong><br/>by Plato<br/><a href="https://jamesclear.com/book/the-republic">Print</a> | <a href="https://jamesclear.com/ebook/the-republic">eBook</a> | <a href="https://jamesclear.com/audiobook/the-republic">Audiobook</a></p>, <p><strong>The Little Prince</strong><br/>by Antoine de Saint-Exupery<br/><a href="https://jamesclear.com/book/the-little-prince" title="The Little Prince by Antoine de Saint-Exupery">Print</a> | <a href="https://jamesclear.com/audiobook/the-little-prince" title="The Little Prince by Antoine de Saint-Exupery">Audiobook</a></p>, <p><strong>Free Will</strong><br/>by Sam Harris<br/><a href="https://jamesclear.com/book/free-will">Print</a> | <a href="https://jamesclear.com/ebook/free-will">eBook</a> | <a href="https://jamesclear.com/audiobook/free-will">Audiobook</a><br/><a href="https://jamesclear.com/book-summaries/free-will">Read my summary of this book »</a></p>, <p><strong>Candide</strong><br/>by Voltaire<br/><a href="https://jamesclear.com/book/candide" title="Candide by Voltaire">Print</a> | <a href="https://jamesclear.com/audiobook/candide" title="Candide audiobook">Audiobook</a></p>, <p>Or, <a href="https://jamesclear.com/best-books" title="Browse all book recommendations.">browse all book recommendations</a>.</p>, <p>]

从里面我需要得到带有书名的p标签。例如冥想,小王子等等。

代码语言:javascript
复制
 <p><strong>Meditations</strong><br


<p><strong>The Little Prince</strong><br/

打印后的代码(All_links_and_titles)应该如下所示:

代码语言:javascript
复制
for Only_titles in All_links_and_titles:
    Only_titles = All_links_and_titles.find( ????)
    print(Only_titles)

到目前为止什么都不起作用。需要帮助。提前谢谢你。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-12-03 22:41:15

尝试使用CSS选择器p strong,它选择<p>标记下的所有<strong>标记。

代码语言:javascript
复制
from bs4 import BeautifulSoup

html = """<p>Looking for good philosophy books? [And on..]">browse all book recommendations</a>.</p>, <p>"""
soup = BeautifulSoup(html, "html.parser")

for tag in soup.select("p strong"):
    print(tag.text)

输出:

代码语言:javascript
复制
Manual for Living
Meditations
The Republic
The Little Prince
Free Will
Candide

在你的例子中:

代码语言:javascript
复制
for tag in all_links_and_titles:
    title = tag.select_one("p strong")
    # Were only calling the `.text` method if it's not None
    if title:
        print(title.text)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/65135108

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档