文章/答案/技术大牛

发布

社区首页 >问答首页 >在python中迭代HTML中的"class“属性？

问在python中迭代HTML中的"class“属性？
EN

Stack Overflow用户

提问于 2013-01-10 15:14:11

回答 1查看 85关注 0票数 1

我有一个网站的HTML字符串。下面是其中的一部分。

<p class="news-body">
<a href="/ci/content/player/45568.html" target="new">Paul Harris,</a> the South African spinner, is to retire at the end of the season, bringing to an end a 14-year first-class career.
</p>
<p class="news-body">
 Harris played 37 Tests for South Africa with his slow-left arm but nearly turned his back on international cricket after a stint as a Kolpak with Warwickshire in 2006. The retirement of Nicky Boje prompted Harris' eventual call-up and he went on to take 103 wickets at 37.87.
</p>
<p class="news-body">
His last Test was in Cape Town against India in January 2011 after which he was dropped for legspinner Imran Tahir. As recently as the start of this season he indicated his intention to compete for a Test place once again.
</p>  </div>
   <!-- body area ends here  -->

我想提取所有上述文本，这是存在于所有的<p class="news-body">。

我用过美汤。

from BeautifulSoup import BeautifulSoup
html = #the HTML code you've written above
parsed_html = BeautifulSoup(html)
print parsed_html.body.find('p', attrs={'class':'news-body'}).text

不幸的是，上面的代码只返回第一行，即：

Paul Harris,the South African spinner, is to retire at the end of the season, bringing to an end a 14-year first-class career.

我想让它返回所有的文本。

python

html

beautifulsoup

回答 1

Stack Overflow用户

发布于 2013-01-10 15:19:19

find只查找第一个元素。您需要findAll，它将返回一个元素列表。

您可以像这样将它们的文本连接在一起：

text = '\n'.join(element.text for element in soup.findAll('p', ...))

另外，我建议您升级到最新版本的BeautifulSoup。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/14252609

复制

相似问题

问在python中迭代HTML中的"class“属性？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在python中迭代HTML中的"class“属性？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在python中迭代HTML中的"class“属性？
EN