我想在python中学习webscraping,但我不知道如何或从哪里开始。我的代码运行,但它只返回一个空字符串。
import requests
import urllib
from urllib.request import urlopen
from bs4 import BeautifulSoup
#import pandas as pd
html = urllib.request.urlopen("https://www.nba.com/games")
soup= BeautifulSoup(html, "lxml")
games= soup.find_all("li", class_= "w-full flex flex-col flex-1 md:w-7/12 lg:w-5/12")
print(games)发布于 2021-10-22 11:12:46
您的脚本正在返回一个空字符串,因为您所描述的类中没有<li>元素。然而,有一个<div>。将其改为此将有效:
games = soup.find_all("div", class_= "shadow-block bg-white flex md:rounded text-sm relative mb-4")给予你:
/*
* 提示:该行代码过长,系统自动注释不进行高亮。一键复制会移除系统注释
* [<div class="shadow-block bg-white flex md:rounded text-sm relative mb-4"><div class="w-full flex flex-col flex-1 md:w-7/12 lg:w-5/12"><a class="flex-1 px-2 pt-5 h-full block hover:no-underline relative text-sm pt-5 pb-4 mb-1 px-2" href="/game/dal-vs-atl-0022100014"><div class="flex"><article class="w-1/4"><figure class="mx-auto mb-2" style="width:52px;height:52px"><div class="TeamLogo_block__1FJrR"><img alt=" Logo" class="TeamLogo_logo__1CmT9" loading="lazy" src="https://cdn.nba.com/logos/nba/1610612742/primary/L/logo.svg" title=" Logo"/></div></figure><div class="flex justify-center items-center"><span class="whitespace-no-wrap">Mavericks</span></div><p class="leading-none text-center">-</p></article><div class="flex justify-center flex-1 text-center mt-3"><div class="w-1/3 text-left"></div><div class="flex-col items-start justify-start flex-1 w-full"><div class="flex flex-col items-center"><p class="text-xs uppercase mt-2">FINAL</p></div></div><div class="w-1/3 text-right"></div></div><article class="w-1/4"><figure class="mx-auto mb-2" style="width:52px;height:52px"><div class="TeamLogo_block__1FJrR"><img alt=" Logo" class="TeamLogo_logo__1CmT9" loading="lazy" src="https://cdn.nba.com/logos/nba/1610612737/primary/L/logo.svg" title=" Logo"/></div></figure><div class="flex justify-center items-center"><span class="whitespace-no-wrap">Hawks</span></div><p class="leading-none text-center">-</p></article></div></a><ul class="flex border-concrete border-t"><li class="TabLink_tab__1ugCW block flex-1"><a class="block py-3 text-xs font-bold text-center text-cerulean Anchor_complexLink__2NtkO" data-content="0022100014" data-id="nba:games:watch" data-premium="true" data-section="Watch" data-text="DAL @ ATL, 2021-10-21" data-track="video" href="/game/dal-vs-atl-0022100014?watch">WATCH</a></li><li class="TabLink_tab__1ugCW block flex-1"><a class="block py-3 text-xs font-bold text-center text-cerulean Anchor_complexLink__2NtkO" data-content="DAL @ ATL, 2021-10-21" data-content-id="0022100014" data-id="nba:games:main:box-score:cta" data-text="BOX SCORE" data-track="click" data-type="cta" href="/game/dal-vs-atl-0022100014/box-score#box-score">BOX SCORE</a></li><li class="TabLink_tab__1ugCW block flex-1"><a class="block py-3 text-xs font-bold text-center text-cerulean Anchor_complexLink__2NtkO" data-content="DAL @ ATL, 2021-10-21" data-content-id="0022100014" data-id="nba:games:main:game-details:cta" data-text="GAME DETAILS" data-track="click" data-type="cta" href="/game/dal-vs-atl-0022100014">GAME DETAILS</a></li></ul></div><div class="w-full border-l border-concrete p-5 hidden md:block md:w-5/12 lg:w-5/12 xl:w-4/12 md:px-5 md:pt-3 lg:p-5"><div class="w-full"><p class="t7 mb-2">Game<!-- --> Leaders</p><table class="w-full"><thead class="text-xs font-condensed"><tr class="border-b border-asphalt text-asphalt"><th class="font-normal text-left">PLAYER</th><th class="font-normal text-right">PTS</th><th class="font-normal text-right">REB</th><th class="font-normal text-right">AST</th></tr></thead><tbody><tr class="border-b border-concrete"><td class="flex items-center w-full leading-tight py-2"><div class="w-6 h-6 mr-1"><p>-</p></div><div class="GameCardLeaders_player__2ZGgP"></div></td><td class="text-right">-</td><td class="text-right">-</td><td class="text-right">-</td></tr><tr class="border-b border-concrete"><td class="flex items-center w-full leading-tight py-2"><div class="w-6 h-6 mr-1"><p>-</p></div><div class="GameCardLeaders_player__2ZGgP"></div></td><td class="text-right">-</td><td class="text-right">-</td><td class="text-right">-</td></tr></tbody></table></div></div><div class="w-full p-5 hidden lg:w-2/12 lg:block xl:w-3/12 pl-0"><div class="w-full h-full"><p class="t7 mb-2">Game Recap</p>-</div></div></div>, <div class="shadow-block bg-white flex md:rounded text-sm relative mb-4"><div class="w-full flex flex-col flex-1 md:w-7/12 lg:w-5/12"><a class="flex-1 px-2 pt-5 h-full block hover:no-underline relative text-sm pt-5 pb-4 mb-1 px-2" href="/game/mil-vs-mia-0022100015"><div class="flex"><article class="w-1/4"><figure class="mx-auto mb-2" style="width:52px;height:52px"><div class="TeamLogo_block__1FJrR"><img alt=" Logo" class="TeamLogo_logo__1CmT9" loading="lazy" src="https://cdn.nba.com/logos/nba/1610612749/primary/L/logo.svg" title=" Logo"/></div></figure><div class="flex justify-center items-center"><span class="whitespace-no-wrap">Bucks</span></div><p class="leading-none text-center">-</p></article><div class="flex justify-center flex-1 text-center mt-3"><div class="w-1/3 text-left"></div><div class="flex-col items-start justify-start flex-1 w-full"><div class="flex flex-col items-center"><p class="text-xs uppercase mt-2">FINAL</p></div></div><div class="w-1/3 text-right"></div></div><article class="w-1/4"><figure class="mx-auto mb-2" style="width:52px;height:52px"><div class="TeamLogo_block__1FJrR"><img alt=" Logo" class="TeamLogo_logo__1CmT9" loading="lazy" src="https://cdn.nba.com/logos/nba/1610612748/primary/L/logo.svg" title=" Logo"/></div></figure><div class="flex justify-center items-center"><span class="whitespace-no-wrap">Heat</span></div><p class="leading-none text-center">-</p></article></div></a><ul class="flex border-concrete border-t"><li class="TabLink_tab__1ugCW block flex-1"><a class="block py-3 text-xs font-bold text-center text-cerulean Anchor_complexLink__2NtkO" data-content="0022100015" data-id="nba:games:watch" data-premium="true" data-section="Watch" data-text="MIL @ MIA, 2021-10-21" data-track="video" href="/game/mil-vs-mia-0022100015?watch">WATCH</a></li><li class="TabLink_tab__1ugCW block flex-1"><a class="block py-3 text-xs font-bold text-center text-cerulean Anchor_complexLink__2NtkO" data-content="MIL @ MIA, 2021-10-21" data-content-id="0022100015" data-id="nba:games:main:box-score:cta" data-text="BOX SCORE" data-track="click" data-type="cta" href="/game/mil-vs-mia-0022100015/box-score#box-score">BOX SCORE</a></li><li class="TabLink_tab__1ugCW block flex-1"><a class="block py-3 text-xs font-bold text-center text-cerulean Anchor_complexLink__2NtkO" data-content="MIL @ MIA, 2021-10-21" data-content-id="0022100015" data-id="nba:games:main:game-details:cta" data-text="GAME DETAILS" data-track="click" data-type="cta" href="/game/mil-vs-mia-0022100015">GAME DETAILS</a></li></ul></div><div class="w-full border-l border-concrete p-5 hidden md:block md:w-5/12 lg:w-5/12 xl:w-4/12 md:px-5 md:pt-3 lg:p-5"><div class="w-full"><p class="t7 mb-2">Game<!-- --> Leaders</p><table class="w-full"><thead class="text-xs font-condensed"><tr class="border-b border-asphalt text-asphalt"><th class="font-normal text-left">PLAYER</th><th class="font-normal text-right">PTS</th><th class="font-normal text-right">REB</th><th class="font-normal text-right">AST</th></tr></thead><tbody><tr class="border-b border-concrete"><td class="flex items-center w-full leading-tight py-2"><div class="w-6 h-6 mr-1"><p>-</p></div><div class="GameCardLeaders_player__2ZGgP"></div></td><td class="text-right">-</td><td class="text-right">-</td><td class="text-right">-</td></tr><tr class="border-b border-concrete"><td class="flex items-center w-full leading-tight py-2"><div class="w-6 h-6 mr-1"><p>-</p></div><div class="GameCardLeaders_player__2ZGgP"></div></td><td class="text-right">-</td><td class="text-right">-</td><td class="text-right">-</td></tr></tbody></table></div></div><div class="w-full p-5 hidden lg:w-2/12 lg:block xl:w-3/12 pl-0"><div class="w-full h-full"><p class="t7 mb-2">Game Recap</p>-</div></div></div>, <div class="shadow-block bg-white flex md:rounded text-sm relative mb-4"><div class="w-full flex flex-col flex-1 md:w-7/12 lg:w-5/12"><a class="flex-1 px-2 pt-5 h-full block hover:no-underline relative text-sm pt-5 pb-4 mb-1 px-2" href="/game/lac-vs-gsw-0022100016"><div class="flex"><article class="w-1/4"><figure class="mx-auto mb-2" style="width:52px;height:52px"><div class="TeamLogo_block__1FJrR"><img alt=" Logo" class="TeamLogo_logo__1CmT9" loading="lazy" src="https://cdn.nba.com/logos/nba/1610612746/primary/L/logo.svg" title=" Logo"/></div></figure><div class="flex justify-center items-center"><span class="whitespace-no-wrap">Clippers</span></div><p class="leading-none text-center">-</p></article><div class="flex justify-center flex-1 text-center mt-3"><div class="w-1/3 text-left"></div><div class="flex-col items-start justify-start flex-1 w-full"><div class="flex flex-col items-center"><p class="text-xs uppercase mt-2">FINAL</p></div></div><div class="w-1/3 text-right"></div></div><article class="w-1/4"><figure class="mx-auto mb-2" style="width:52px;height:52px"><div class="TeamLogo_block__1FJrR"><img alt=" Logo" class="TeamLogo_logo__1CmT9" loading="lazy" src="https://cdn.nba.com/logos/nba/1610612744/primary/L/logo.svg" title=" Logo"/></div></figure><div class="flex justify-center items-center"><span class="whitespace-no-wrap">Warriors</span></div><p class="leading-none text-center">-</p></article></div></a><ul class="flex border-concrete border-t"><li class="TabLink_tab__1ugCW block flex-1"><a class="block py-3 text-xs font-bold text-center text-cerulean Anchor_complexLink__2NtkO" data-content="0022100016" data-id="nba:games:watch" data-premium="true" data-section="Watch" data-text="LAC @ GSW, 2021-10-21" data-track="video" href="/game/lac-vs-gsw-0022100016?watch">WATCH</a></li><li class="TabLink_tab__1ugCW block flex-1"><a class="block py-3 text-xs font-bold text-center text-cerulean Anchor_complexLink__2NtkO" data-content="LAC @ GSW, 2021-10-21" data-content-id="0022100016" data-id="nba:games:main:box-score:cta" data-text="BOX SCORE" data-track="click" data-type="cta" href="/game/lac-vs-gsw-0022100016/box-score#box-score">BOX SCORE</a></li><li class="TabLink_tab__1ugCW block flex-1"><a class="block py-3 text-xs font-bold text-center text-cerulean Anchor_complexLink__2NtkO" data-content="LAC @ GSW, 2021-10-21" data-content-id="0022100016" data-id="nba:games:main:game-details:cta" data-text="GAME DETAILS" data-track="click" data-type="cta" href="/game/lac-vs-gsw-0022100016">GAME DETAILS</a></li></ul></div><div class="w-full border-l border-concrete p-5 hidden md:block md:w-5/12 lg:w-5/12 xl:w-4/12 md:px-5 md:pt-3 lg:p-5"><div class="w-full"><p class="t7 mb-2">Game<!-- --> Leaders</p><table class="w-full"><thead class="text-xs font-condensed"><tr class="border-b border-asphalt text-asphalt"><th class="font-normal text-left">PLAYER</th><th class="font-normal text-right">PTS</th><th class="font-normal text-right">REB</th><th class="font-normal text-right">AST</th></tr></thead><tbody><tr class="border-b border-concrete"><td class="flex items-center w-full leading-tight py-2"><div class="w-6 h-6 mr-1"><p>-</p></div><div class="GameCardLeaders_player__2ZGgP"></div></td><td class="text-right">-</td><td class="text-right">-</td><td class="text-right">-</td></tr><tr class="border-b border-concrete"><td class="flex items-center w-full leading-tight py-2"><div class="w-6 h-6 mr-1"><p>-</p></div><div class="GameCardLeaders_player__2ZGgP"></div></td><td class="text-right">-</td><td class="text-right">-</td><td class="text-right">-</td></tr></tbody></table></div></div><div class="w-full p-5 hidden lg:w-2/12 lg:block xl:w-3/12 pl-0"><div class="w-full h-full"><p class="t7 mb-2">Game Recap</p>-</div></div></div>]
*/发布于 2021-10-22 11:24:40
按照我的代码:
from bs4 import BeautifulSoup
import requests
source = requests.get('https://www.nba.com/games').text
print(source)
soup = BeautifulSoup(source, 'lxml')
print(soup)
games = soup.find('div', class_='w-full flex flex-col flex-1 md:w-7/12 lg:w-5/12').text
print(games)发布于 2021-10-22 11:33:26
与其解析html,不如直接获取数据。不知道你想要什么数据,但都在json里面。
import requests
jsonData = requests.get("https://cdn.nba.com/static/json/liveData/scoreboard/todaysScoreboard_00.json").json()
scoreboard = jsonData['scoreboard']
gameDate = scoreboard['gameDate']
print(f'{gameDate}')
for game in scoreboard['games']:
homeTeam = game['homeTeam']['teamCity'] + ' ' + game['homeTeam']['teamName']
homeScore = game['homeTeam']['score']
awayTeam = game['awayTeam']['teamCity'] + ' ' + game['awayTeam']['teamName']
awayScore = game['awayTeam']['score']
print(f'{awayTeam}: {awayScore} @ {homeTeam}: {homeScore}')输出:
2021-10-21
Dallas Mavericks: 87 @ Atlanta Hawks: 113
Milwaukee Bucks: 95 @ Miami Heat: 137
LA Clippers: 113 @ Golden State Warriors: 115https://stackoverflow.com/questions/69675783
复制相似问题