首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >StopIteration错误?

StopIteration错误?
EN

Stack Overflow用户
提问于 2015-03-07 16:01:04
回答 1查看 249关注 0票数 0

我正在尝试用WebScrape的url = 'http://m.imdb.com/feature/bornondate‘来显示该页面上显示的10位名人的名字。然而,Python说的是StopIteration,而不是打印我的结果。

这是我的代码,我认为它解释了我正在尝试做的事情。

代码语言:javascript
复制
import urllib2
from bs4 import BeautifulSoup

url = 'http://m.imdb.com/feature/bornondate'

test_url = urllib2.urlopen(url)
readHtml = test_url.read()
test_url.close()

soup = BeautifulSoup(readHtml)
# Using it track the number of Celebrities
count = 0
# Fetching the value present within tag results
celebrities = soup.findChildren('section', 'posters list')
# Changing the celebrity into an iterator
itercelebrity = iter(celebrities[0].findChildren('a'))
# Skipping the first value of the iterator as it does have the required info
next(itercelebrity)

# Finding a in itercelebrity. Every a tag contains information of a celebrity
for a in itercelebrity:

    celebrity = tr.findChildren('div', 'label')
    name = celebrity[0].find('span', 'title').contents[0]

    print '*******************************IMDB CELEBRITYS***********************************'
    # Printing the Name of the celebrity
    print 'Name --> ' + name

这是输出(它没有打印出任何东西)

代码语言:javascript
复制
Patricks-MacBook-Pro:~ Patrick$ python /Users/Patrick/Desktop/IMDB_BornToday_Scraping.py
Traceback (most recent call last):
  File "/Users/Patrick/Desktop/IMDB_BornToday_Scraping.py", line 20, in <module>
    next(itercelebrity)
StopIteration
Patricks-MacBook-Pro:~ Patrick$ 

如果你现在还看不出来,我对此还很陌生:)这是我正在尝试获取的相关html

代码语言:javascript
复制
<section class="posters list">
<h1>March 7</h1>

<a href="/name/nm0186505/" class="poster "><img src="http://ia.media-imdb.com/images/M/MV5BMTA2NjEyMTY4MTVeQTJeQWpwZ15BbWU3MDQ5NDAzNDc@._V1._CR0,0,1369,2019_SX40_SY59.jpg" style="background:url('http://i.media-imdb.com/images/mobile/people-40x59-fade.png')" width="40" height="59"><div class="label"><span class="title">Bryan Cranston</span><div class="detail">Actor, "Ozymandias"</div></div></a><a href="/name/nm0696059/" class="poster "><img src="http://ia.media-imdb.com/images/M/MV5BNjUxNjcxMjE4N15BMl5BanBnXkFtZTgwNDk4NjA2MzE@._V1._CR156,0,1736,2560_SX40_SY59.jpg" style="background:url('http://i.media-imdb.com/images/mobile/people-40x59-fade.png')" width="40" height="59"><div class="label"><span class="title">Laura Prepon</span><div class="detail">Actress, "Karla"</div></div></a><a href="/name/nm0001838/" class="poster "><img src="http://ia.media-imdb.com/images/M/MV5BMTQ4MzM1MDAwMV5BMl5BanBnXkFtZTcwNTU4NzQwMw@@._V1._CR5,0,271,400_SX40_SY59.jpg" style="background:url('http://i.media-imdb.com/images/mobile/people-40x59-fade.png')" width="40" height="59"><div class="label"><span class="title">Rachel Weisz</span><div class="detail">Actress, "The Mummy"</div></div></a><a href="/name/nm0765597/" class="poster "><img src="http://ia.media-imdb.com/images/M/MV5BMjE0Mjg0NzE2Nl5BMl5BanBnXkFtZTcwMDE1MTkxMw@@._V1._CR19,0,271,400_SX40_SY59.jpg" style="background:url('http://i.media-imdb.com/images/mobile/people-40x59-fade.png')" width="40" height="59"><div class="label"><span class="title">Peter Sarsgaard</span><div class="detail">Actor, "Jarhead"</div></div></a><a href="/name/nm0278979/" class="poster "><img src="http://ia.media-imdb.com/images/M/MV5BMTMyOTYzODQ5MF5BMl5BanBnXkFtZTcwMjE3MDgzMQ@@._V1._CR24,0,271,400_SX40_SY59.jpg" style="background:url('http://i.media-imdb.com/images/mobile/people-40x59-fade.png')" width="40" height="59"><div class="label"><span class="title">Jenna Fischer</span><div class="detail">Actress, "Blades of Glory"</div></div></a><a href="/name/nm0614220/" class="poster "><img src="http://ia.media-imdb.com/images/M/MV5BMzE2OTAwNzM0Ml5BMl5BanBnXkFtZTcwNzE1MDg0Mw@@._V1._CR26,0,488,720_SX40_SY59.jpg" style="background:url('http://i.media-imdb.com/images/mobile/people-40x59-fade.png')" width="40" height="59"><div class="label"><span class="title">Donna Murphy</span><div class="detail">Actress, "Tangled"</div></div></a><a href="/name/nm0862328/" class="poster "><img src="http://ia.media-imdb.com/images/M/MV5BMTI0OTMzMzE0N15BMl5BanBnXkFtZTcwMjI1MzYyMQ@@._V1._CR33,0,235,346_SX40_SY59.jpg" style="background:url('http://i.media-imdb.com/images/mobile/people-40x59-fade.png')" width="40" height="59"><div class="label"><span class="title">T.J. Thyne</span><div class="detail">Actor, "How the Grinch Stole Christmas"</div></div></a><a href="/name/nm0001334/" class="poster "><img src="http://ia.media-imdb.com/images/M/MV5BNzczODkyNzY4OV5BMl5BanBnXkFtZTcwNTU0NjQzMQ@@._V1._CR41,0,368,543_SX40_SY59.jpg" style="background:url('http://i.media-imdb.com/images/mobile/people-40x59-fade.png')" width="40" height="59"><div class="label"><span class="title">John Heard</span><div class="detail">Actor, "Home Alone"</div></div></a><a href="/name/nm1017524/" class="poster "><img src="http://ia.media-imdb.com/images/M/MV5BMTg4MjU2MzA2OV5BMl5BanBnXkFtZTgwOTIxMjc4MjE@._V1._CR0,0,3644,5375_SX40_SY59.jpg" style="background:url('http://i.media-imdb.com/images/mobile/people-40x59-fade.png')" width="40" height="59"><div class="label"><span class="title">Audrey Marie Anderson</span><div class="detail">Actress, "Beerfest"</div></div></a><a href="/name/nm0891216/" class="poster "><img src="http://ia.media-imdb.com/images/M/MV5BMTQyOTc5NzA0M15BMl5BanBnXkFtZTYwODQ2MjYz._V1._CR0,0,266,392_SX40_SY59.jpg" style="background:url('http://i.media-imdb.com/images/mobile/people-40x59-fade.png')" width="40" height="59"><div class="label"><span class="title">Matthew Vaughn</span><div class="detail">Producer, "Kick-Ass"</div></div></a><div class="paginator"><a class="next" data-start="10" href="#page10">Show more...</a><a class="seeAll" href="#showAll">See all</a></div></section>
EN

回答 1

Stack Overflow用户

发布于 2015-03-07 16:04:24

该错误由以下原因引起:

代码语言:javascript
复制
celebrities[0].findChildren('a')

没有结果,这会导致迭代器就像你做的一样:

代码语言:javascript
复制
it = iter([])
next(it)

这将导致相同的异常:

代码语言:javascript
复制
>>> it = iter([])
>>> next(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

一个更好的方法是使用CSS选择器,使用soup.select()方法。这会打印出所有的名字

代码语言:javascript
复制
for name in soup.select("section.posters.list a.poster div.label span.title"):
    print name.string

会打印出所有的名字。选择器可能过于具体了。

但这并不管用,我已经找出了原因。查看从获取页面返回的HTML:

代码语言:javascript
复制
<section class="posters list">
<h1>&nbsp;</h1>
<span class="loading"></span>
</section>

section的内容不是要提取的。它们是由AJAX请求加载的。这是从以下位置发出的:

代码语言:javascript
复制
<script language="javascript" type="text/javascript">
$(document).ready(function() {
    var pagination = $('section.posters').itemPagination(10)
    var now = new Date();

    var client = new IMDbClient();
    client.useSessionCache(true);
    client.call('/feature/bornondate_json?today='+now.toYYYYMMDD(), function(data) {
        pagination(data.list);
    });
    var months = ['January','February','March','April','May','June','July','August','September','October','November','December'];
    $('section.posters > h1').html( months[now.getMonth()] + ' ' + now.getDate() );
});
</script>

如果要提取数据,最好的选择是使用浏览器驱动程序,如Selenium

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/28912714

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档