首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用请求或selenium抓取体育数据

使用请求或selenium抓取体育数据
EN

Stack Overflow用户
提问于 2021-05-06 01:29:49
回答 1查看 106关注 0票数 1

我正在尝试从这个页面抓取数据:https://www.sofascore.com/betting-tips-today

我创建了以下代码,但不起作用:

代码语言:javascript
复制
import requests

url = "https://www.sofascore.com/betting-tips-today"

r = requests.get(url).json()

print(r)

我尝试过selenium,但不起作用:

代码语言:javascript
复制
from bs4 import BeautifulSoup
import time

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = Options()
# options.add_argument("--headless")          #headless
options.add_argument('--no-sandbox')
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')   

driver = webdriver.Chrome(executable_path=r"C:/chromedriver.exe", options=options)

u = "https://www.sofascore.com/betting-tips-today"
driver.get(u)

WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div[class^='Content__PageContainer-sc-']")))

time.sleep(20)
elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("innerHTML")

soup = BeautifulSoup(driver.page_source, 'html.parser')
# print(len(soup.find_all('h2')))
# print(len(soup.select('.ivqpwB')))
parent_soup = soup.find('h2', text=("Odds") ).parent.parent.select('div:nth-of-type(2) > div')
print(len(parent_soup))
for i in parent_soup:
    print(i)

你知道我该如何抓取这个页面中的数据吗?

EN

回答 1

Stack Overflow用户

发布于 2021-05-06 15:16:25

您可以像这样尝试:

代码语言:javascript
复制
import time

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("--headless")
options.add_argument("--no-sandbox")
options.add_argument("--ignore-certificate-errors")
options.add_argument("--incognito")

driver = webdriver.Chrome(
    executable_path=r"C:/chromedriver.exe", options=options
)
u = "https://www.sofascore.com/betting-tips-today"
driver.get(u)

# Get the page
WebDriverWait(driver, 20).until(
    EC.visibility_of_element_located(
        (By.CSS_SELECTOR, "div[class^='Content__PageContainer-sc-']")
    )
)
time.sleep(20)

# Get the table
elem = driver.find_element_by_xpath(
    '//*[@id="__next"]/main/div/div[2]/div/div[1]/div[2]/table'
)
source_code = elem.get_attribute("innerHTML")

# Parse the html
soup = BeautifulSoup(driver.page_source, "html.parser")

# Get the interesting data for each row
data = []
for row in soup.find_all("tr")[4:]:
    infos = []
    for item in row.find_all("td"):
        for label in item.find_all("div"):
            infos.append(label.text)
        infos.append(item.text)
    data.append(infos[3:5] + infos[13:14] + infos[16:17] + infos[20:])

print(data)
# Outputs
[['La Guaira', 'América Cali', '3.25', '3.20', '13.25X3.2022.25',
'1', '', '1', '31%', 'wins 57%'], ['Hapoel Holon', 'Burgos', '2.75',
'1.40', '1', '36%', 'wins 60%'] ...]

现在您有了一个列表(data)列表(每行一个列表)。你可以用Pandas制作一个数据帧,然后做更多的工作。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/67406092

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档