问我正在尝试使用beautifulsoup4和requests库抓取网站
EN

Stack Overflow用户

提问于 2021-05-04 23:18:32

回答 1查看 31关注 0票数 0

我想从这个网站上提取电影的名称，年份和长度

代码如下：

import requests
from bs4 import BeautifulSoup

URL = 'https://www4.f2movies.to'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')

#Trending Movies
Movies = []
Year = []
Length = []

for a in soup.findAll('a', href=True, attrs={'class':"film-detail film-detail-fix"}):
    name=data.find('div', href=True, attrs={'class':'film-name'})
    year=data.find('span', href=True, attrs={'class':'fdi-item'})
    length=data.find('span', href=True, attrs={'class':'fdi-item fdi-duration'})
    Movies.append(name.text)
    Year.append(year.text)
    Length.append(length.text)

print(Movies)
print(Year)
print(Length)

我得到的结果如下所示：

(Projects) anildhage@xxx-MacBook-Air WebScrape % python scrape.py
[]
[]
[]
(Projects) anildhage@xxx-MacBook-Air WebScrape %

有没有人能建议我哪里错了？提亚

python

beautifulsoup

回答 1

Stack Overflow用户

发布于 2021-05-04 23:29:19

你的一些选择器在使用find()时是不正确的。要获取所有数据，请使用以下示例：

import requests
from bs4 import BeautifulSoup

URL = "https://www4.f2movies.to"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")

# Trending Movies
Movies = []
Year = []
Length = []

for data in soup.findAll("div", attrs={"class": "film-detail film-detail-fix"}):
    name = data.find("h3", attrs={"class": "film-name"})
    year = data.find("span", attrs={"class": "fdi-item"})
    length = data.find("span", attrs={"class": "fdi-item fdi-duration"})
    if not length:
        continue

    Movies.append(name.text.strip())
    Year.append(year.text)
    Length.append(length.text)


print(Movies)
print(Year)
print(Length)

输出：

["Tom Clancy's Without Remorse", 'The Mitchells vs. The Machines', 'Mortal Kombat', 'Things Heard & Seen', 'Demon Slayer the Movie: Mugen Train', 'Voyagers', 'Tom & Jerry', 'Godzilla vs. Kong', 'Justice Society: World War II', 'Nomadland', 'The Virtuoso', 'Shadow in the Cloud', 'Nobody', 'Skylines', "Zack Snyder's Justice League", 'Stowaway', '22 vs. Earth', 'The Marksman', 'The Little Things', 'Wonder Woman 1984', 'Raya and the Last Dragon', 'The Father', 'SAS: Red Notice', 'Come True', 'The Lockdown Hauntings', 'The Bike Thief', 'Generation Por Que', 'Adolescents of Chymera', 'The Darkness', 'The Rise of Sir Longbottom', 'Mexican Moon', "She was the Deputy's Wife", '100m Criminal Conviction', 'Percy', 'The Mitchells vs. The Machines', 'Zombie with a Shotgun', 'Things Heard & Seen', 'Golden Arm', 'Bang! Bang!', 'Colors of Love', 'Three Pints and a Rabbi', 'Eat Wheaties!', "Before I'm Dead", '22 vs. Earth', 'The Outside Story', 'Voyagers', 'Ape vs. Monster', 'Pipeline']
['2021', '2021', '2021', '2021', '2020', '2021', '2021', '2021', '2021', '2020', '2021', '2020', '2021', '2020', '2021', '2021', '2021', '2021', '2021', '2020', '2021', '2020', '2021', '2021', '2021', '2020', '2021', '2021', '2021', '2021', '2021', '2021', '2021', '2021', '2021', '2019', '2021', '2021', '2020', '2021', '2021', '2020', '0000', '2021', '2021', '2021', '2021', '2021']
['109m', '113m', '110m', '121m', '117m', '108m', '90m', '113m', 'N/A', '108m', '105m', '83m', '92m', '110m', '242m', '116m', '5m', '108m', '127m', '151m', '112m', '97m', '120m', '105m', '101m', '79m', 'N/A', '81m', 'N/A', '73m', '84m', '95m', '92m', '109m', '113m', '79m', '121m', '90m', '71m', '110m', '85m', 'N/A', '83m', '5m', '85m', '108m', '90m', '85m']

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/67387553

复制

相似问题

问我正在尝试使用beautifulsoup4和requests库抓取网站
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问我正在尝试使用beautifulsoup4和requests库抓取网站EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问我正在尝试使用beautifulsoup4和requests库抓取网站
EN