Javascript 使用promise pool和Puppeter创建持续增加的列表
我需要使用Puppeter创建刮削工具,但是我在向队列添加项目时遇到一些问题 我得到了什么Javascript 使用promise pool和Puppeter创建持续增加的列表,javascript,promise,puppeteer,Javascript,Promise,Puppeteer,我需要使用Puppeter创建刮削工具,但是我在向队列添加项目时遇到一些问题 我得到了什么 const PromisePool = require("@supercharge/promise-pool"); const puppeteer = require("puppeteer"); const domain = process.argv[2]; let list = []; list[0] = domain; const run = async
const PromisePool = require("@supercharge/promise-pool");
const puppeteer = require("puppeteer");
const domain = process.argv[2];
let list = [];
list[0] = domain;
const run = async () => {
const { results, errors } = await PromisePool.for(list)
.withConcurrency(2)
.process(async (webpage) => {
links = [];
const getData = async () => {
return await page.evaluate(async () => {
return await new Promise((resolve) => {
resolve(Array.from(document.querySelectorAll("a")).map((anchor) => [anchor.href]));
});
});
};
links = await getData();
for (var link in links) {
var new_url = String(links[link]);
new_url = new_url.split("#")[0];
console.log("new url: " + new_url);
if (new_url.includes(domain)) {
if (new_url in list) {
console.log("Url already exists: " + new_url);
continue;
}
list[new_url] = new_url;
} else {
console.log("Url is external: " + new_url);
}
}
browser.close();
});
};
const mainFunction = async () => {
const result = await run();
return result;
};
(async () => {
console.log(await mainFunction());
console.log(list);
})();
问题在里面
links = [];
const getData = async () => {
return await page.evaluate(async () => {
return await new Promise((resolve) => {
resolve(Array.from(document.querySelectorAll("a")).map((anchor) => [anchor.href]));
});
});
};
links = await getData();
page.evaluate是异步的,它不会等待返回。对于下一个PromisePool进程,此链接永远不会更新
我需要一种方法来等待响应返回,然后继续处理脚本的其余部分。您可以使用单个wait来检索相同的链接
page.$$eval(selector, pageFunction[, ...args])
这基本上就是您试图实现的,因为$$eval
方法“在页面[context]中运行数组.from(document.querySelectorAll(selector))
,并将其作为第一个参数传递给页面函数
”
例如:
const links=wait page.$$eval('a',anchors=>anchors.map(el=>el.href));
听起来像是您想要的东西,而不是在静态列表中运行的东西。