我正在尝试从this链接获取新闻文章。我的代码是:
def get_news_details(news_url):
source = requests.get(news_url)
plain_text = source.text
soup = BeautifulSoup(plain_text, "html.parser")
content = soup.findAll('div', {'class' : 'big-img-box'})
print(content[0].findAll('p'))结果表明:
[<p></p>, <p></p>, <p></p>, <p></p>, <p></p>, <p></p>]content的值是:
<div class="big-img-box">
<div class="left-imgs">
<figure>
<img alt="iOS developer hints possibility of 4K Apple TV" class="img-responsive" src="http://www.aninews.in/contentimages/detail/appletv.jpg"/>
<figcaption><span class="heading-inner-span"></span></figcaption>
</figure>
<div class="mb10"></div>
</div>
<p></p> New York [USA], August 6 <a class="highlights" href="http://aninews.in/" target="_blank">(ANI)</a>: The latest designs from Apple's HomePod firmware revealed that the tech giant is hinting the launch of a <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/4k-apple-tv.html"> 4K Apple TV</a></span> with high dynamic range (HDR) support for both <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/hdr10.html"> HDR10 </a></span> and <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/dolby-vision.html"> Dolby Vision</a></span>.<p></p> While the current range of Apple's TV set-top box is incompatible to 4K technology, <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/ios.html">iOS</a></span> developer <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/guilherme-rambo.html"> Guilherme Rambo</a></span> revealed that the company is hinting an adoption of the ultra high-definition format, reports <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/the-verge.html">The Verge</a></span>.<p></p> Reports of the new range of Apple TV have surfaced time and again over the past few months, starting February this year.<p></p> It is said that implementing the HDR and 4K content will prove to b beneficial for the company, rather than a simpler resolution, since popular online movie and television platforms like <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/netflix.html"> Netflix</a></span> and <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/amazon.html"> Amazon</a></span> support the two high-definition formats.<p></p> Last month, iTunes started listing movies as supporting 4K and <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/hdr.html"> HDR</a></span> in users' purchase histories, thus providing more thrust to the speculations of the 4K <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/apple.html"> Apple</a></span> TV. <a class="highlights" href="http://aninews.in/" target="_blank">(ANI)</a><p></p>
</div>我可以从content[0].text获得这篇文章的一个有点笨拙的版本,但我不能格式化它。
在用chrome查看网页时,文章似乎是写在<p>article_text</p>标记中的。而在content中,它显示为<p></p>article_text标记。如果以前的版本在soup中,我就可以得到我想要的输出。应该做些什么?
发布于 2017-08-16 02:22:48
这取决于你所谓的格式化是什么意思。你可以用相当简单的方式让它变得更“整洁”。
>>> import bs4
>>> import requests
>>> page = requests.get('http://www.aninews.in/newsdetail-Nw/MzI4NDIy/ios-developer-hints-possibility-of-4k-apple-tv.html').content
>>> soup = bs4.BeautifulSoup(page, 'lxml')
>>> big_img_box = soup.select('.big-img-box')获取所有文本并去掉空格。
>>> big_img_box[0].text.strip()
"New York [USA], August 6 (ANI): The latest designs from Apple's HomePod firmware revealed that the tech giant is hinting the launch of a 4K Apple TV with high dynamic range (HDR) support for both HDR10 and Dolby Vision. While the current range of Apple's TV set-top box is incompatible to 4K technology, iOS developer Guilherme Rambo revealed that the company is hinting an adoption of the ultra high-definition format, reports The Verge. Reports of the new range of Apple TV have surfaced time and again over the past few months, starting February this year. It is said that implementing the HDR and 4K content will prove to b beneficial for the company, rather than a simpler resolution, since popular online movie and television platforms like Netflix and Amazon support the two high-definition formats. Last month, iTunes started listing movies as supporting 4K and HDR in users' purchase histories, thus providing more thrust to the speculations of the 4K Apple TV. (ANI)"在此基础上,删除较长的内部空白字符串。
>>> import re
>>> re.sub(r'\s{2,}', ' ', big_img_box[0].text.strip())
"New York [USA], August 6 (ANI): The latest designs from Apple's HomePod firmware revealed that the tech giant is hinting the launch of a 4K Apple TV with high dynamic range (HDR) support for both HDR10 and Dolby Vision. While the current range of Apple's TV set-top box is incompatible to 4K technology, iOS developer Guilherme Rambo revealed that the company is hinting an adoption of the ultra high-definition format, reports The Verge. Reports of the new range of Apple TV have surfaced time and again over the past few months, starting February this year. It is said that implementing the HDR and 4K content will prove to b beneficial for the company, rather than a simpler resolution, since popular online movie and television platforms like Netflix and Amazon support the two high-definition formats. Last month, iTunes started listing movies as supporting 4K and HDR in users' purchase histories, thus providing more thrust to the speculations of the 4K Apple TV. (ANI)"https://stackoverflow.com/questions/45698505
复制相似问题