文章/答案/技术大牛

发布

社区首页 >问答首页 >使用请求或selenium抓取体育数据

问使用请求或selenium抓取体育数据
EN

Stack Overflow用户

提问于 2021-05-06 01:29:49

回答 1查看 106关注 0票数 1

我正在尝试从这个页面抓取数据：https://www.sofascore.com/betting-tips-today

我创建了以下代码，但不起作用：

import requests

url = "https://www.sofascore.com/betting-tips-today"

r = requests.get(url).json()

print(r)

我尝试过selenium，但不起作用：

from bs4 import BeautifulSoup
import time

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = Options()
# options.add_argument("--headless")          #headless
options.add_argument('--no-sandbox')
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')   

driver = webdriver.Chrome(executable_path=r"C:/chromedriver.exe", options=options)

u = "https://www.sofascore.com/betting-tips-today"
driver.get(u)

WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div[class^='Content__PageContainer-sc-']")))

time.sleep(20)
elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("innerHTML")

soup = BeautifulSoup(driver.page_source, 'html.parser')
# print(len(soup.find_all('h2')))
# print(len(soup.select('.ivqpwB')))
parent_soup = soup.find('h2', text=("Odds") ).parent.parent.select('div:nth-of-type(2) > div')
print(len(parent_soup))
for i in parent_soup:
    print(i)

你知道我该如何抓取这个页面中的数据吗？

selenium

web-scraping

beautifulsoup

python

回答 1

Stack Overflow用户

发布于 2021-05-06 15:16:25

您可以像这样尝试：

import time

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("--headless")
options.add_argument("--no-sandbox")
options.add_argument("--ignore-certificate-errors")
options.add_argument("--incognito")

driver = webdriver.Chrome(
    executable_path=r"C:/chromedriver.exe", options=options
)
u = "https://www.sofascore.com/betting-tips-today"
driver.get(u)

# Get the page
WebDriverWait(driver, 20).until(
    EC.visibility_of_element_located(
        (By.CSS_SELECTOR, "div[class^='Content__PageContainer-sc-']")
    )
)
time.sleep(20)

# Get the table
elem = driver.find_element_by_xpath(
    '//*[@id="__next"]/main/div/div[2]/div/div[1]/div[2]/table'
)
source_code = elem.get_attribute("innerHTML")

# Parse the html
soup = BeautifulSoup(driver.page_source, "html.parser")

# Get the interesting data for each row
data = []
for row in soup.find_all("tr")[4:]:
    infos = []
    for item in row.find_all("td"):
        for label in item.find_all("div"):
            infos.append(label.text)
        infos.append(item.text)
    data.append(infos[3:5] + infos[13:14] + infos[16:17] + infos[20:])

print(data)
# Outputs
[['La Guaira', 'América Cali', '3.25', '3.20', '13.25X3.2022.25',
'1', '', '1', '31%', 'wins 57%'], ['Hapoel Holon', 'Burgos', '2.75',
'1.40', '1', '36%', 'wins 60%'] ...]

现在您有了一个列表(data)列表(每行一个列表)。你可以用Pandas制作一个数据帧，然后做更多的工作。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/67406092

复制

相似问题

问使用请求或selenium抓取体育数据
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用请求或selenium抓取体育数据EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用请求或selenium抓取体育数据
EN