我正在尝试从这个页面刮掉所有的自行车:https://www.reconpowerbikes.com/recon-bikes/,但是它只有没有价格的名字,如果我想点击这个页面中的"Shop Now“按钮,点击这个页面中的”Shop Now“按钮,然后转到每个页面以获得当前的价格(自行车定期切换)。我怎样才能用硒来做呢?

!apt-get update
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
!pip install selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#set up Chrome driver
options=webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
#Define web driver as a Chrome driver
driver=webdriver.Chrome('chromedriver',options=options)
driver.implicitly_wait(10)
URL='https://www.reconpowerbikes.com/recon-bikes/'
driver.get(URL)
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//div/div[@class="blaze-pagination"]/button[@class=""]'))).click()发布于 2022-12-02 00:32:55
您可以尝试漂亮的汤从页面源或java脚本中提取URL。以下是javascript版本。
result = driver.execute_script('''
allbikes=document.querySelectorAll(".blaze-slider__description")
result=[]
for (var i = 0; i < allbikes.length; i++) {
let bike=allbikes[i]
let bike_url=bike.getElementsByClassName("blaze-slider__button")[0]
//console.log(bike_url.getAttribute("href"))
result.push(bike_url.getAttribute("href"))
}
return result
''')
for bike in result:
print(bike)
#for each bike URL, access the page and then get the priceJavascript:获取所有的滑块div,然后为每个div获取href属性。
https://stackoverflow.com/questions/74649391
复制相似问题