Javascript Nodejs刮刀是';t移到下一页

Javascript Nodejs刮刀是';t移到下一页,javascript,node.js,web-scraping,cheerio,Javascript,Node.js,Web Scraping,Cheerio,嘿,伙计们,这是我另一个问题的后续问题,我创建了一个Nodejs刮板,似乎不想浏览页面,它停留在第一页。我的源代码如下 const rp = require('request-promise'); const request = require('request'); const otcsv = require('objects-to-csv'); const cheerio = require('cheerio'); //URL To scrape const baseURL = 'xxx'

嘿,伙计们,这是我另一个问题的后续问题,我创建了一个Nodejs刮板,似乎不想浏览页面,它停留在第一页。我的源代码如下

const rp = require('request-promise');
const request = require('request');
const otcsv = require('objects-to-csv');
const cheerio = require('cheerio');

//URL To scrape
const baseURL = 'xxx';
const searchURL = 'xxxx';

//scrape info
const getCompanies = async () => {
  // Pagination test

  for (let index = 1; index <= 20; index = index + 1) {
    const html = await rp.get(baseURL + searchURL + index);
    const $ = await cheerio.load(html);
    console.log("Loading Pages....");
    console.log("At page number " + index);
    // end pagination test
    //const htmls = await rp(baseURL + searchURL);
    const businessMap = cheerio('a.business-name', html).map(async (i, e) => {
      const link = baseURL + e.attribs.href;
      const innerHtml = await rp(link);
      const emailAddress = cheerio('a.email-business', innerHtml).prop('href');
      const name = e.children[0].data || cheerio('h1', innerHtml).text();
      const phone = cheerio('p.phone', innerHtml).text();

      return {
        //  link,
        name,
        emailAddress: emailAddress ? emailAddress.replace('mailto:', '') : '',
        phone,
      }

    }).get();
    return Promise.all(businessMap);
  }
};
console.log("Finished Scraping.... Now Saving!")
//save to CSV
getCompanies()
  .then(result => {
    const transformed = new otcsv(result);
    return transformed.toDisk('./output.csv');
  })
  .then(() => console.log('Scrape Complete :D '));
const rp=require('request-promise');
const request=require('request');
const otcsv=require('objects-to-csv');
const cheerio=需要(“cheerio”);
//要刮取的URL
const baseURL='xxx';
const searchURL='xxxx';
//刮取信息
const getcompanys=async()=>{
//分页测试
对于(设索引=1;索引{
const link=baseURL+e.attribs.href;
const innerHtml=等待rp(链接);
const emailAddress=cheerio('a.email-business',innerHtml).prop('href');
const name=e.children[0]。数据| | cheerio('h1',innerHtml)。text();
const phone=cheerio('p.phone',innerHtml).text();
返回{
//链接,
名称
emailAddress:emailAddress?emailAddress.replace('mailto:','':''),
电话,
}
}).get();
返回承诺。全部(businessMap);
}
};
log(“已完成抓取…正在保存!”)
//保存到CSV
getCompanies()
。然后(结果=>{
转换常数=新的otcsv(结果);
返回转换后的.toDisk('./output.csv');
})
.然后(()=>console.log('Scrape Complete:D');

正如您所见,我已经尝试了几种不同的方法来实现这一点,因此我们将非常感谢您的帮助。

如果没有URL,很难说。要记住的一件事是,cheerio不会回报承诺。这里只有rp.get()返回承诺。您还应该在主循环中使用
$