我有一个解析HTML页面的代码。
from bs4 import BeautifulSoup
with open('Books-_html.txt') as page:
soup = BeautifulSoup(page, "lxml")
Items = soup.find('div',{'class':'main'})
All_links_and_titles = Items.findAll('p')
print(All_links_and_titles)打印后留下的html如下:
[<p>Looking for good philosophy books? This is my list of the best philosophy books of all-time. If you only have time to read one or two books, I recommend looking at the Top Philosophy Books section below.</p>, <p>Further down the page, you'll find more philosophy book recommendations. Many of these books are fantastic as well. I try to carefully curate all of my reading lists and you can be sure that any philosophy book on this page is worth your time. Enjoy!</p>, <p><strong>Manual for Living<br/></strong>by Epictetus<br/><a href="https://jamesclear.com/book/manual-for-living">Print</a> | <a href="https://jamesclear.com/audiobook/manual-for-living">Audiobook</a><br/><a href="https://jamesclear.com/book-summaries/manual-for-living">Read my summary of this book »</a></p>, <p><strong>Meditations</strong><br/>by Marcus Aurelius<br/><a href="https://jamesclear.com/book/meditations">Print</a> | <a href="https://jamesclear.com/ebook/meditations">eBook</a> | <a href="https://jamesclear.com/audiobook/meditations">Audiobook</a></p>, <p><strong>The Republic</strong><br/>by Plato<br/><a href="https://jamesclear.com/book/the-republic">Print</a> | <a href="https://jamesclear.com/ebook/the-republic">eBook</a> | <a href="https://jamesclear.com/audiobook/the-republic">Audiobook</a></p>, <p><strong>The Little Prince</strong><br/>by Antoine de Saint-Exupery<br/><a href="https://jamesclear.com/book/the-little-prince" title="The Little Prince by Antoine de Saint-Exupery">Print</a> | <a href="https://jamesclear.com/audiobook/the-little-prince" title="The Little Prince by Antoine de Saint-Exupery">Audiobook</a></p>, <p><strong>Free Will</strong><br/>by Sam Harris<br/><a href="https://jamesclear.com/book/free-will">Print</a> | <a href="https://jamesclear.com/ebook/free-will">eBook</a> | <a href="https://jamesclear.com/audiobook/free-will">Audiobook</a><br/><a href="https://jamesclear.com/book-summaries/free-will">Read my summary of this book »</a></p>, <p><strong>Candide</strong><br/>by Voltaire<br/><a href="https://jamesclear.com/book/candide" title="Candide by Voltaire">Print</a> | <a href="https://jamesclear.com/audiobook/candide" title="Candide audiobook">Audiobook</a></p>, <p>Or, <a href="https://jamesclear.com/best-books" title="Browse all book recommendations.">browse all book recommendations</a>.</p>, <p>]从里面我需要得到带有书名的p标签。例如冥想,小王子等等。
<p><strong>Meditations</strong><br
<p><strong>The Little Prince</strong><br/打印后的代码(All_links_and_titles)应该如下所示:
for Only_titles in All_links_and_titles:
Only_titles = All_links_and_titles.find( ????)
print(Only_titles)到目前为止什么都不起作用。需要帮助。提前谢谢你。
发布于 2020-12-03 22:41:15
尝试使用CSS选择器p strong,它选择<p>标记下的所有<strong>标记。
from bs4 import BeautifulSoup
html = """<p>Looking for good philosophy books? [And on..]">browse all book recommendations</a>.</p>, <p>"""
soup = BeautifulSoup(html, "html.parser")
for tag in soup.select("p strong"):
print(tag.text)输出:
Manual for Living
Meditations
The Republic
The Little Prince
Free Will
Candide在你的例子中:
for tag in all_links_and_titles:
title = tag.select_one("p strong")
# Were only calling the `.text` method if it's not None
if title:
print(title.text)https://stackoverflow.com/questions/65135108
复制相似问题