首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >用巨蟒和美丽的汤刮选票和恶心

用巨蟒和美丽的汤刮选票和恶心
EN

Stack Overflow用户
提问于 2019-12-29 00:23:43
回答 1查看 911关注 0票数 1

我正在使用BeautifulSoup来抓取一个imdb网页(https://www.imdb.com/search/title/?release_date=2017&sort=num_votes,desc&page=1)。我已经成功地刮到了名字,年份,介绍,投票,导演等,但在“恶心”和“演员”上有困难。

代码语言:javascript
复制
<p class="sort-num_votes-visible">
                <span class="text-muted">Votes:</span>
                <span name="nv" data-value="591671">591,671</span>
    <span class="ghost">|</span>                <span class="text-muted">Gross:</span>
                <span name="nv" data-value="226,277,068">$226.28M</span>
        </p>

代码语言:javascript
复制
<p class="">
    Director:
<a href="/name/nm0003506/?ref_=adv_li_dr_0">James Mangold</a>
                 <span class="ghost">|</span> 
    Stars:
<a href="/name/nm0413168/?ref_=adv_li_st_0">Hugh Jackman</a>, 
<a href="/name/nm0001772/?ref_=adv_li_st_1">Patrick Stewart</a>, 
<a href="/name/nm6748436/?ref_=adv_li_st_2">Dafne Keen</a>, 
<a href="/name/nm2933542/?ref_=adv_li_st_3">Boyd Holbrook</a>
    </p>

下面是我使用的代码:

代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup

directors=[]
actors=[]
votes=[]
grosses=[]

res_movie = requests.get('http://www.imdb.com/search/titlerelease_date='+'2018'+'&sort=num_votes,desc&page='+'1')
bs_movie = BeautifulSoup(res_movie.text,'html.parser')
movies=bs_movie.find_all('div', class_='lister-item mode-advanced')

for movie in movies:

    director=movie.find('p',class_='').find_all('a')[0].text
    directors.append(director)

    actors.append(movie.find('p',class_='').find_all('a')[1:].text) 

    vote=movie.find_all('span', attrs = {'name':'nv'})[0].text
    votes.append(vote)

    gross=movie.find_all('span', attrs = {'name':'nv'})[1].text
    grosses.append(gross)

我从演员那里得到的错误:

代码语言:javascript
复制
---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)
<ipython-input-70-a969b9a65fa7> in <module>
     60     directors.append(director)
     61 
---> 62     actors.append(movie.find('p',class_='').find_all('a')[:1].text)
     63 
     64 

AttributeError: 'list' object has no attribute 'text'

我从恶心中得到的错误是:

代码语言:javascript
复制
---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)
<ipython-input-69-bd813766e1ca> in <module>
     74     votes.append(vote)
     75 
---> 76     gross=movie.find_all('span', attrs = {'name':'nv'})[1].text
     77     grosses.append(gross)
     78 # print(directors)

IndexError: list index out of range

我希望使用列表的索引来获得我想要的元素。我很乐意学习获得元素的适当方法。提前谢谢!!

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-12-29 00:38:29

对行为者的错误:

find_all()返回已找到元素的列表,因此需要迭代此列表以获取每个元素的文本。

毛额错误:

对于一些电影来说,总收入是不存在的,所以我们需要首先检查是否存在。

固定版本:

代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup

directors=[]
actors=[]
votes=[]
grosses=[]

url = 'https://www.imdb.com/search/title/?release_date=2018&sort=num_votes,desc&page=1'
res_movie = requests.get(url)
bs_movie = BeautifulSoup(res_movie.text,'html.parser')
movies=bs_movie.find_all('div', class_='lister-item mode-advanced')

for movie in movies:
    director=movie.find('p',class_='').find_all('a')[0].text
    directors.append(director)

    actors.append([a.text for a in movie.find('p',class_='').find_all('a')[1:]])    # <-- using list comprehension

    nv = movie.find_all('span', attrs = {'name':'nv'})

    vote=nv[0].text
    votes.append(vote)

    gross= nv[1].text if len(nv) > 1 else '-'       # <-- check if Gross revenue exists for the movie
    grosses.append(gross)

# print the values:
for d, a, v, g in zip(directors, actors, votes, grosses):
    print('{:<22} {!s:<120} {:<12} {}'.format(d, a, v, g))

指纹:

代码语言:javascript
复制
Anthony Russo          ['Joe Russo', 'Robert Downey Jr.', 'Chris Hemsworth', 'Mark Ruffalo', 'Chris Evans']                                     734,642      $678.82M
Ryan Coogler           ['Chadwick Boseman', 'Michael B. Jordan', "Lupita Nyong'o", 'Danai Gurira']                                              557,058      $700.06M
David Leitch           ['Ryan Reynolds', 'Josh Brolin', 'Morena Baccarin', 'Julian Dennison']                                                   429,727      $324.59M
Bryan Singer           ['Rami Malek', 'Lucy Boynton', 'Gwilym Lee', 'Ben Hardy']                                                                398,775      $216.43M
John Krasinski         ['Emily Blunt', 'John Krasinski', 'Millicent Simmonds', 'Noah Jupe']                                                     339,291      $188.02M
Steven Spielberg       ['Tye Sheridan', 'Olivia Cooke', 'Ben Mendelsohn', 'Lena Waithe']                                                        324,204      $137.69M
James Wan              ['Jason Momoa', 'Amber Heard', 'Willem Dafoe', 'Patrick Wilson']                                                         317,403      $335.06M
Ruben Fleischer        ['Tom Hardy', 'Michelle Williams', 'Riz Ahmed', 'Scott Haze']                                                            316,446      $213.52M

...and so on.
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/59515994

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档