最近,我将一个脚本从Puppeteer转换为Puppeteer集群,在测试过程中,我在同时测试多个页面时观察到了一些奇怪的结果。
实际上,我正在加载一个页面,然后迭代页面上的产品选项,并收集任何产品变体的价格。
一个特定的产品有大约9个产品变体,有时我会准确地捕捉所有9个变体,而在下一个测试周期它可能只返回2或3个变体。
任何帮助都将不胜感激!
const puppeteer = require('puppeteer');
const { Cluster } = require('puppeteer-cluster');
const Product = require('../utils/product')
const config = require('../config/config.json')
const selectors = config.productData;
(async () => {
const urls = [
{link: ...},
{link: ...},
{link: ...}
]
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_CONTEXT,
maxConcurrency: 5,
puppeteerOptions: {
headless: false
},
});
await cluster.task(async ({ page, data: url }) => {
//instantiate a new product object
const product = new Product();
await page.goto(url, { waitUntil: 'load' });
const skuprice = await page.$eval(selectors.price, element => element.innerText);
console.log('Sku Price:' + skuprice)
//deal with variants
const options = await page.$$eval(selectors.variant, elements => elements.map(element=>element.id))
if (options.length > 0) {
//set up a variants array
for (let index = 0; index < options.length; index++) {
const element = options[index];
await page.waitForSelector(`#${element}`);
await page.$eval(`#${element}`, radio => radio.click());
await page.waitForTimeout(500);
const variantprice = await page.$eval(selectors.price, element => element.innerText);
console.log('Variant Price:' + variantprice)
}
}
});
urls.forEach(url => {
cluster.queue(url.link);
})
// many more pages
await cluster.idle();
await cluster.close();
})();发布于 2022-02-13 16:26:10
当所有元素都可见时,应该对动态javascript页面进行抓取。
您可以执行以下技巧:
1等选择器可见时,请用
await page.waitForSelector(selector, {visible: true, timeout: 0})
等待所需的时间,但这是更片状和容易产生错误。
您可以简化和重写代码,如下所示:
await page.waitForSelector(`#${element}`, {visible: true, timeout: 0})
await page.click(`#${element}`)
/* await page.waitForTimeout(500) <= prone to error, use line below */
await page.waitForSelector(selectors.price, {visible: true, timeout: 0})
const variantprice = await page.$eval(selectors.price, element => element.innerText)发布于 2022-02-14 10:29:35
对于其他搜索答案的人来说,我的一些CSS选择器似乎没有进行页面刷新。
重读项目文档,包括以下内容:
// Event handler to be called in case of problems
cluster.on('taskerror', (err, data) => {
console.log(`Error crawling ${data}: ${err.message}`);
});https://stackoverflow.com/questions/71101328
复制相似问题