我正在使用BeautifulSoup来抓取一个imdb网页(https://www.imdb.com/search/title/?release_date=2017&sort=num_votes,desc&page=1)。我已经成功地刮到了名字,年份,介绍,投票,导演等,但在“恶心”和“演员”上有困难。
<p class="sort-num_votes-visible">
<span class="text-muted">Votes:</span>
<span name="nv" data-value="591671">591,671</span>
<span class="ghost">|</span> <span class="text-muted">Gross:</span>
<span name="nv" data-value="226,277,068">$226.28M</span>
</p><p class="">
Director:
<a href="/name/nm0003506/?ref_=adv_li_dr_0">James Mangold</a>
<span class="ghost">|</span>
Stars:
<a href="/name/nm0413168/?ref_=adv_li_st_0">Hugh Jackman</a>,
<a href="/name/nm0001772/?ref_=adv_li_st_1">Patrick Stewart</a>,
<a href="/name/nm6748436/?ref_=adv_li_st_2">Dafne Keen</a>,
<a href="/name/nm2933542/?ref_=adv_li_st_3">Boyd Holbrook</a>
</p>下面是我使用的代码:
import requests
from bs4 import BeautifulSoup
directors=[]
actors=[]
votes=[]
grosses=[]
res_movie = requests.get('http://www.imdb.com/search/titlerelease_date='+'2018'+'&sort=num_votes,desc&page='+'1')
bs_movie = BeautifulSoup(res_movie.text,'html.parser')
movies=bs_movie.find_all('div', class_='lister-item mode-advanced')
for movie in movies:
director=movie.find('p',class_='').find_all('a')[0].text
directors.append(director)
actors.append(movie.find('p',class_='').find_all('a')[1:].text)
vote=movie.find_all('span', attrs = {'name':'nv'})[0].text
votes.append(vote)
gross=movie.find_all('span', attrs = {'name':'nv'})[1].text
grosses.append(gross)我从演员那里得到的错误:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-70-a969b9a65fa7> in <module>
60 directors.append(director)
61
---> 62 actors.append(movie.find('p',class_='').find_all('a')[:1].text)
63
64
AttributeError: 'list' object has no attribute 'text'我从恶心中得到的错误是:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-69-bd813766e1ca> in <module>
74 votes.append(vote)
75
---> 76 gross=movie.find_all('span', attrs = {'name':'nv'})[1].text
77 grosses.append(gross)
78 # print(directors)
IndexError: list index out of range我希望使用列表的索引来获得我想要的元素。我很乐意学习获得元素的适当方法。提前谢谢!!
发布于 2019-12-29 00:38:29
对行为者的错误:
find_all()返回已找到元素的列表,因此需要迭代此列表以获取每个元素的文本。
毛额错误:
对于一些电影来说,总收入是不存在的,所以我们需要首先检查是否存在。
固定版本:
import requests
from bs4 import BeautifulSoup
directors=[]
actors=[]
votes=[]
grosses=[]
url = 'https://www.imdb.com/search/title/?release_date=2018&sort=num_votes,desc&page=1'
res_movie = requests.get(url)
bs_movie = BeautifulSoup(res_movie.text,'html.parser')
movies=bs_movie.find_all('div', class_='lister-item mode-advanced')
for movie in movies:
director=movie.find('p',class_='').find_all('a')[0].text
directors.append(director)
actors.append([a.text for a in movie.find('p',class_='').find_all('a')[1:]]) # <-- using list comprehension
nv = movie.find_all('span', attrs = {'name':'nv'})
vote=nv[0].text
votes.append(vote)
gross= nv[1].text if len(nv) > 1 else '-' # <-- check if Gross revenue exists for the movie
grosses.append(gross)
# print the values:
for d, a, v, g in zip(directors, actors, votes, grosses):
print('{:<22} {!s:<120} {:<12} {}'.format(d, a, v, g))指纹:
Anthony Russo ['Joe Russo', 'Robert Downey Jr.', 'Chris Hemsworth', 'Mark Ruffalo', 'Chris Evans'] 734,642 $678.82M
Ryan Coogler ['Chadwick Boseman', 'Michael B. Jordan', "Lupita Nyong'o", 'Danai Gurira'] 557,058 $700.06M
David Leitch ['Ryan Reynolds', 'Josh Brolin', 'Morena Baccarin', 'Julian Dennison'] 429,727 $324.59M
Bryan Singer ['Rami Malek', 'Lucy Boynton', 'Gwilym Lee', 'Ben Hardy'] 398,775 $216.43M
John Krasinski ['Emily Blunt', 'John Krasinski', 'Millicent Simmonds', 'Noah Jupe'] 339,291 $188.02M
Steven Spielberg ['Tye Sheridan', 'Olivia Cooke', 'Ben Mendelsohn', 'Lena Waithe'] 324,204 $137.69M
James Wan ['Jason Momoa', 'Amber Heard', 'Willem Dafoe', 'Patrick Wilson'] 317,403 $335.06M
Ruben Fleischer ['Tom Hardy', 'Michelle Williams', 'Riz Ahmed', 'Scott Haze'] 316,446 $213.52M
...and so on.https://stackoverflow.com/questions/59515994
复制相似问题