Javascript Puppeter获取href数组，然后遍历该页面上的每个href和href_Javascript_Html_Node.js_Arrays_Puppeteer

Javascript Puppeter获取href数组，然后遍历该页面上的每个href和href

javascript html node.js arrays

Javascript Puppeter获取href数组，然后遍历该页面上的每个href和href,javascript,html,node.js,arrays,puppeteer,Javascript,Html,Node.js,Arrays,Puppeteer,我正试图通过node.js中的puppeter获取数据目前，我正在寻找编写一个脚本，该脚本将在well.ca的某个部分中收集所有数据现在，这里是我试图通过node.js实现的方法/逻辑 1-前往现场医疗卫生科 2-使用dom选择器获取HREF数组从.panel body content通过dom选择器panel body content a[href]刮取子节 3-使用for循环迭代每个链接（小节） 4对于每个子节链接，通过获取每个类的href值col-lg-5ths col-md-3 c

我正试图通过node.js中的puppeter获取数据

目前，我正在寻找编写一个脚本，该脚本将在well.ca的某个部分中收集所有数据

现在，这里是我试图通过node.js实现的方法/逻辑

1-前往现场医疗卫生科

2-使用dom选择器获取HREF数组从

.panel body content

通过dom选择器

panel body content a[href]

刮取子节

3-使用for循环迭代每个链接（小节）

4对于每个子节链接，通过获取每个类的href值

col-lg-5ths col-md-3 col-sm-4 col-xs-6

via

.col-lg-5ths col-md-3 col-sm-4 col-xs-6[href]

5-在小节中循环每个产品

6-为每个产品收集数据

目前，我已经编写了上述大部分代码：

const puppeteer = require('puppeteer');
const chromeOptions = {
  headless: false,
  defaultViewport: null,
};
(async function main() {
  const browser = await puppeteer.launch(chromeOptions);
  try {
    const page = await browser.newPage();
    await page.goto("https://well.ca/categories/medicine-health_2.html");
    console.log("::::::: OPEN WELL   ::::::::::");

    // href attribute
    const hrefs1 = await page.evaluate(
      () => Array.from(
        document.querySelectorAll('.panel-body-content a[href]'),
       a => a.getAttribute('href')
     )
   );
    
    console.log(hrefs1);

    const urls = hrefs1

    for (let i = 0; i < urls.length; i++) {
      const url = urls[i];
      await page.goto(url);
    }
      const hrefs2 = await page.evaluate(
     () => Array.from(
      document.querySelectorAll('.col-lg-5ths col-md-3 col-sm-4 col-xs-6 a[href]'),
       a => a.getAttribute('href')
     )
    );

const puppeter=require（'puppeter'）；
常数色度选项={
无头：错，
defaultViewport:null，
};
（异步函数main（）{
const browser=wait puppeter.launch（chromeOptions）；
试一试{
const page=wait browser.newPage（）；
等待页面。转到（“https://well.ca/categories/medicine-health_2.html");
console.log（“：：开井：开井：开井：开井：开井：开井：开井：开井：开井：开井：开井：开井：开井：开井：开井：开井：开井：开井：开井：开井：开井：开井：开井：开；
//href属性
const hrefs1=等待页面。评估(
（）=>Array.from(
document.querySelectorAll（'.panel body content a[href]'），
a=>a.getAttribute（'href'）
)
);
控制台日志（hrefs1）；
常量URL=hrefs1
for（设i=0；iArray.from(
document.queryselectoral（'.col-lg-5ths col-md-3 col-sm-4 col-xs-6 a[href]'），
a=>a.getAttribute（'href'）
)
);

当我尝试为每个产品的每个href获取一个数组时，数组中没有任何内容

如何添加一个嵌套for循环，以获取每个小节中每个产品的所有HREF的数组，然后访问每个产品链接

获取类

中所有HREF的正确dom选择器是什么。col-lg-5ths col-md-3 col-sm-4 col-xs-6

，id为

product\u grid\u link

如果我想添加一个后续循环，通过每个小节中产品的href从每个产品获取信息，我如何将其嵌入到代码中

任何帮助都将不胜感激

有些链接似乎是重复的，因此最好收集最终页面的所有链接，重复删除链接列表，然后刮除最终页面。（您也可以将最终页面的链接保存在文件中，以便以后使用。）此脚本收集5395个链接（重复删除）

“严格使用”；
const puppeter=require（'puppeter'）；
（异步函数main（）{
试一试{
const browser=wait puppeter.launch（{headless:false，defaultViewport:null}）；
const[page]=wait browser.pages（）；
等待页面。转到（'https://well.ca/categories/medicine-health_2.html');
const HREFSCategoriesDuplicated=新设置（等待第页。评估）(
（）=>Array.from(
document.querySelectorAll（'.panel body content a[href]'），
a=>a.href
)
));
常量hrefsPages=[]；
for（HREFSCategoriesDuplicated的常量url）{
等待页面。转到（url）；
hrefsPages.push（…等待page.evaluate）(
（）=>Array.from(
document.queryselectoral（'.col-lg-5ths.col-md-3.col-sm-4.col-xs-6 a[href]'），
a=>a.href
)
));
}
const hrefspagesdeped=新集合（hrefsPages）；
//HREFSPAGESDeped可以转换回数组
//如果需要，现在保存在JSON文件中。
for（HREFSPAGESDEPED的常量url）{
等待页面。转到（url）；
//刮掉这一页。
}
等待浏览器关闭（）；
}捕捉（错误）{
控制台错误（err）；
}
})();

非常好，它工作得很好！只是一个简单的问题，

.col-lg-5ths.col-md-3.col-sm-4.col-xs-6 a[href]“

似乎为每个产品都提供了两个链接。我如何只能从这个类

产品网格链接中获取href

？非常感谢，vsemozhetbyt.

document.queryselectoral（'.col-lg-5ths.col-md-3.col-sm-4.col-xs-6 a[href].product_grid_link'）

ahh我也这么认为。它很有魅力！这确实是解决方案。