Javascript 使用promise pool和Puppeter创建持续增加的列表

Javascript 使用promise pool和Puppeter创建持续增加的列表,javascript,promise,puppeteer,Javascript,Promise,Puppeteer,我需要使用Puppeter创建刮削工具,但是我在向队列添加项目时遇到一些问题 我得到了什么 const PromisePool = require("@supercharge/promise-pool"); const puppeteer = require("puppeteer"); const domain = process.argv[2]; let list = []; list[0] = domain; const run = async

我需要使用Puppeter创建刮削工具,但是我在向队列添加项目时遇到一些问题

我得到了什么

const PromisePool = require("@supercharge/promise-pool");
const puppeteer = require("puppeteer");

const domain = process.argv[2];

let list = [];
list[0] = domain;

const run = async () => {
  const { results, errors } = await PromisePool.for(list)
    .withConcurrency(2)
    .process(async (webpage) => {
      links = [];

      const getData = async () => {
        return await page.evaluate(async () => {
          return await new Promise((resolve) => {
            resolve(Array.from(document.querySelectorAll("a")).map((anchor) => [anchor.href]));
          });
        });
      };

      links = await getData();

      for (var link in links) {
        var new_url = String(links[link]);
        new_url = new_url.split("#")[0];
        console.log("new url: " + new_url);
        if (new_url.includes(domain)) {
          if (new_url in list) {
            console.log("Url already exists: " + new_url);
            continue;
          }

          list[new_url] = new_url;
        } else {
          console.log("Url is external: " + new_url);
        }
      }
      browser.close();
    });
};

const mainFunction = async () => {
  const result = await run();
  return result;
};

(async () => {
  console.log(await mainFunction());
  console.log(list);
})();
问题在里面

links = [];

const getData = async () => {
  return await page.evaluate(async () => {
    return await new Promise((resolve) => {
      resolve(Array.from(document.querySelectorAll("a")).map((anchor) => [anchor.href]));
    });
  });
};

links = await getData();
page.evaluate是异步的,它不会等待返回。对于下一个PromisePool进程,此链接永远不会更新

我需要一种方法来等待响应返回,然后继续处理脚本的其余部分。

您可以使用单个
wait来检索相同的链接

page.$$eval(selector, pageFunction[, ...args])
这基本上就是您试图实现的,因为
$$eval
方法“在页面[context]中运行
数组.from(document.querySelectorAll(selector))
,并将其作为第一个参数传递给
页面函数

例如:

const links=wait page.$$eval('a',anchors=>anchors.map(el=>el.href));
听起来像是您想要的东西,而不是在静态列表中运行的东西。