我正在尝试从这个页面抓取数据:https://www.sofascore.com/betting-tips-today
我创建了以下代码,但不起作用:
import requests
url = "https://www.sofascore.com/betting-tips-today"
r = requests.get(url).json()
print(r)我尝试过selenium,但不起作用:
from bs4 import BeautifulSoup
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = Options()
# options.add_argument("--headless") #headless
options.add_argument('--no-sandbox')
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
driver = webdriver.Chrome(executable_path=r"C:/chromedriver.exe", options=options)
u = "https://www.sofascore.com/betting-tips-today"
driver.get(u)
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div[class^='Content__PageContainer-sc-']")))
time.sleep(20)
elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("innerHTML")
soup = BeautifulSoup(driver.page_source, 'html.parser')
# print(len(soup.find_all('h2')))
# print(len(soup.select('.ivqpwB')))
parent_soup = soup.find('h2', text=("Odds") ).parent.parent.select('div:nth-of-type(2) > div')
print(len(parent_soup))
for i in parent_soup:
print(i)你知道我该如何抓取这个页面中的数据吗?
发布于 2021-05-06 15:16:25
您可以像这样尝试:
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("--headless")
options.add_argument("--no-sandbox")
options.add_argument("--ignore-certificate-errors")
options.add_argument("--incognito")
driver = webdriver.Chrome(
executable_path=r"C:/chromedriver.exe", options=options
)
u = "https://www.sofascore.com/betting-tips-today"
driver.get(u)
# Get the page
WebDriverWait(driver, 20).until(
EC.visibility_of_element_located(
(By.CSS_SELECTOR, "div[class^='Content__PageContainer-sc-']")
)
)
time.sleep(20)
# Get the table
elem = driver.find_element_by_xpath(
'//*[@id="__next"]/main/div/div[2]/div/div[1]/div[2]/table'
)
source_code = elem.get_attribute("innerHTML")
# Parse the html
soup = BeautifulSoup(driver.page_source, "html.parser")
# Get the interesting data for each row
data = []
for row in soup.find_all("tr")[4:]:
infos = []
for item in row.find_all("td"):
for label in item.find_all("div"):
infos.append(label.text)
infos.append(item.text)
data.append(infos[3:5] + infos[13:14] + infos[16:17] + infos[20:])
print(data)
# Outputs
[['La Guaira', 'América Cali', '3.25', '3.20', '13.25X3.2022.25',
'1', '', '1', '31%', 'wins 57%'], ['Hapoel Holon', 'Burgos', '2.75',
'1.40', '1', '36%', 'wins 60%'] ...]现在您有了一个列表(data)列表(每行一个列表)。你可以用Pandas制作一个数据帧,然后做更多的工作。
https://stackoverflow.com/questions/67406092
复制相似问题