我试图刮这个网站的评论,但无法这样做。
driver = webdriver.Chrome("C:/Users/hp/Downloads/chromedriver_win32/chromedriver.exe")
driver.get("https://www.sephora.com/product/luminous-silk-foundation-P393401?skuId=1491380&icid2=products%20grid:p393401:product")
reviews = driver.find_elements(By.CLASS_NAME,"css-k7hahd eanm77i0")
for review in reviews:
post = review.text
print(post)发布于 2022-07-25 12:28:28
根据用户的输入,该页面正在动态加载。例如,要加载评论,需要向下滚动页面。使用selenium和BeautifulSoup检索评论的一种方法(当然也有其他方法)举例如下:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time as t
from bs4 import BeautifulSoup
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
url='https://www.sephora.com/product/luminous-silk-foundation-P393401?skuId=1491380&icid2=products%20grid:p393401:product'
browser.get(url)
browser.execute_script("window.scrollTo(0,document.body.scrollHeight);")
t.sleep(5)
browser.execute_script("window.scrollTo(0,document.body.scrollHeight);")
t.sleep(5)
reviews_tab = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.ID,"ratings-reviews-container")))
random_review = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.CLASS_NAME,"eanm77i0")))
soup = BeautifulSoup(reviews_tab.get_attribute('innerHTML'), 'html.parser')
print([x.text.strip() for x in soup.select('div.eanm77i0')])
browser.quit()在这种情况下,还有其他比selenium更有效的方法来检索评论。
https://stackoverflow.com/questions/73107765
复制相似问题