我正在尝试从https://maanesten.com/product-category/accessories-2/hair-claws/上抓取产品名称。事实证明,源代码只提供了20种产品的列表,而不是在他们的网站上看到的动态视图,以及chrome上的inspect。意味着剩下的细节都被埋在了某个地方。我该怎么做呢?
下面是我当前的代码:
DRIVER_PATH = 'pathto/chromedriver'
options = Options()
options.headless = True
options.add_argument("--window-size=1920,1200")
driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)
driver.get("https://maanesten.com/product-category/accessories-2/hair-claws/")
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
elements = WebDriverWait(driver,30).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[@id='content']")))
for element in elements:
print(element.text)
driver.quit()正如预期的那样,它只返回了20个产品,直到这部分源代码的ul结束,它成为不可见的分页的开始:
<li class="last original-cat-id-loop-346 original-prod-id-loop- post-118264 product type-product status-publish has-post-thumbnail product_cat-aw20 product_cat-hair-claws taxable shipping-taxable purchasable product-type-simple product-cat-aw20 product-cat-hair-claws outofstock">
<a href="https://maanesten.com/product/helo-hairclaw-peach-sky/">
<a href="https://maanesten.com/product/helo-hairclaw-peach-sky/">
<img src="/wp-content/themes/maanesten/images_optimized/126044-shop_catalog-product_page.jpg"><p class="wc-new-badge"><span>New</span></p><h3>Helo Hairclaw Peach Sky</h3>
</a>
</a>
</li>
</ul>
<nav class="woocommerce-pagination">
<ul class='page-numbers'>
<li><span class='page-numbers current'>1</span></li>
<li><a class='page-numbers' href='https://maanesten.com/product-category/accessories-2/hair-claws/page/2/'>2</a></li>
<li><a class='page-numbers' href='https://maanesten.com/product-category/accessories-2/hair-claws/page/3/'>3</a></li>
<li><a class='page-numbers' href='https://maanesten.com/product-category/accessories-2/hair-claws/page/4/'>4</a></li>
<li><span class="page-numbers dots">…</span></li>
<li><a class='page-numbers' href='https://maanesten.com/product-category/accessories-2/hair-claws/page/8/'>8</a></li>
<li><a class='page-numbers' href='https://maanesten.com/product-category/accessories-2/hair-claws/page/9/'>9</a></li>
<li><a class='page-numbers' href='https://maanesten.com/product-category/accessories-2/hair-claws/page/10/'>10</a></li>
<li><a class="next page-numbers" href="https://maanesten.com/product-category/accessories-2/hair-claws/page/2/">→</a></li>
</ul>
</nav>
</div>如何通过python上的selenium访问其余产品?任何洞察力都是值得欣赏的。
发布于 2020-12-20 11:41:23
页面似乎加载了前20个产品。然后还有一些JS代码,它会在滚动时获取更多产品,这意味着它会动态获取其余产品。我能够计算出动态获取产品的XHR调用。

您必须模拟这些XHR调用才能获得其余的产品
https://stackoverflow.com/questions/65376659
复制相似问题