首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在段落标签中提取完整的文本

如何在段落标签中提取完整的文本
EN

Stack Overflow用户
提问于 2013-05-06 18:40:10
回答 1查看 4K关注 0票数 2

我需要提取完整的文本,除了<p><a href><rel>等从以下html代码。

代码语言:javascript
复制
<p>Many of the features that made the Samsung Galaxy S4 one of the most anticipated phones in recent history -- such as its 5-inch 1920 x 1080 <a href="http://www.bubblews.com/news/421662-samsung-galaxy-s4-worlds-first-full-hd-super-amoled-display" rel="nofollow" target="_blank">Full HD Super AMOLED display</a>, its powerful processors (<a href="http://www.samsung.com/global/business/semiconductor/minisite/Exynos/blog_Spotlight_on_the_Exynos5Octa.html" rel="nofollow" target="_blank">Samsung Exynos 5 Octa</a> in the international version and <a href="http://www.qualcomm.com/snapdragon/blog/topics/snapdragon 600" rel="nofollow" target="_blank">Qualcomm Snapdragon 600</a> in the U.S. version) and 16GB, 32GB and 64GB storage options -- are now bringing grief to those who rushed to purchase the fourth-generation Galaxy S series smartphone upon its late April release.</p>

我已经尝试了下面的代码

代码语言:javascript
复制
from bs4 import BeautifulSoup
from urllib2 import urlopen

BASE_URL = "http://www.chicagoreader.com"

def get_category_links(section_url):
    html = urlopen(section_url).read()
    soup = BeautifulSoup(html, "lxml")
    for div in soup.findall("div", attrs={'class':'field-content'}):
          print div.find("p").content[0]

但是给出了以下输出

使三星Galaxy S4成为近代史上最受期待的手机之一的许多功能--例如它的5英寸1920x1080

我无法获得完整的文本,它应该给出后的href和rel等标签的文本,请建议我如何获得以下输出。

三星Galaxy S4成为近代史上最受期待的手机之一的许多功能--如5英寸1920 x 1080全高清超级AMOLED显示屏、强大的处理器.Samsung Exynos5 Octa国际版和“>高通Snapdrag600美国版”--以及16 to、32 to和64 to的存储选项--现在正给那些在4月下旬第四代Galaxy S系列智能手机发布时匆忙抢购的人带来悲痛。

谢谢..

EN

回答 1

Stack Overflow用户

发布于 2013-05-06 18:49:54

您可以使用.text

代码语言:javascript
复制
>>> from bs4 import BeautifulSoup
>>> html = '<p>Many of the features that made the Samsung Galaxy S4 one of the most anticipated phones in recent history -- such as its 5-inch 1920 x 1080 <a href="http://www.bubblews.com/news/421662-samsung-galaxy-s4-worlds-first-full-hd-super-amoled-display" rel="nofollow" target="_blank">Full HD Super AMOLED display</a>, its powerful processors (<a href="http://www.samsung.com/global/business/semiconductor/minisite/Exynos/blog_Spotlight_on_the_Exynos5Octa.html" rel="nofollow" target="_blank">Samsung Exynos 5 Octa</a> in the international version and <a href="http://www.qualcomm.com/snapdragon/blog/topics/snapdragon 600" rel="nofollow" target="_blank">Qualcomm Snapdragon 600</a> in the U.S. version) and 16GB, 32GB and 64GB storage options -- are now bringing grief to those who rushed to purchase the fourth-generation Galaxy S series smartphone upon its late April release.</p>'
>>> soup = BeautifulSoup(html)
>>> print soup.p.text
Many of the features that made the Samsung Galaxy S4 one of the most anticipated phones in recent history -- such as its 5-inch 1920 x 1080 Full HD Super AMOLED display, its powerful processors (Samsung Exynos 5 Octa in the international version and Qualcomm Snapdragon 600 in the U.S. version) and 16GB, 32GB and 64GB storage options -- are now bringing grief to those who rushed to purchase the fourth-generation Galaxy S series smartphone upon its late April release.
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/16396982

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档