Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/node.js/42.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Javascript 木偶演员只需刮取大约200页,don';我不能继续_Javascript_Node.js_Json_Nodes_Puppeteer - Fatal编程技术网

Javascript 木偶演员只需刮取大约200页,don';我不能继续

Javascript 木偶演员只需刮取大约200页,don';我不能继续,javascript,node.js,json,nodes,puppeteer,Javascript,Node.js,Json,Nodes,Puppeteer,出于某种原因,我不明白我的节点应用程序在几分钟后停止了抓取,没有任何错误,而只是抓取,顺便说一句,这是一个无限滚动的网站。。。 代码如下: const fs = require('fs'); (async() => { // start the browser const browser = await puppeteer.launch({ args: ['--no-sandbox'] }); // open a new page const page =

出于某种原因,我不明白我的节点应用程序在几分钟后停止了抓取,没有任何错误,而只是抓取,顺便说一句,这是一个无限滚动的网站。。。 代码如下:

const fs = require('fs');

(async() => {
    // start the browser
    const browser = await puppeteer.launch({ args: ['--no-sandbox'] });
    // open a new page
    const page = await browser.newPage();
    const pageURL = 'http://www.yad4.co.il/dogs//////////////#1';
    try {
        // try to go to URL
        await page.goto(pageURL);
        console.log(`opened the page: ${pageURL}`);
        await page.setViewport({
            width: 1200,
            height: 800
        });
        await autoScroll(page);
    } catch (error) {
        console.log(`failed to open the page: ${pageURL} with the error: ${error}`);
    }
  // Find all links to dogs
  const postsSelector = '.yd-search-page .container .row .col-md-9 .yd-gallery .search-handler-yd .col-xs-12 #dogs_more .col-md-4 .yd-dog-img .yd-mask a';
  await page.waitForSelector(postsSelector);
  const postUrls = await page.$$eval(postsSelector, postLinks => postLinks.map(link => link.href));

  // Visit each page one by one
  for (let postUrl of postUrls) {

      // open the page
      try {
          await page.goto(postUrl);
          console.log('opened the page: ', postUrl);
      } catch (error) {
          console.log(error);
          console.log('failed to open the page: ', postUrl);
      }
      // get the name of the dog
      const dogSelector = '.adopt.yd-amuta .container .yd-dog-cont .col-xs-12 .adopt-head .row .col-sm-6 .adopt-breadcrumb-title h2 span';
     // await page.waitForSelector(dogSelector);
      const dogName = await page.$eval(dogSelector, dogSelector => dogSelector.innerHTML);

        // Writing the news inside a json file
 fs.appendFile("dogtest4.json", JSON.stringify({dogName},), function(err) {
    if (err) throw err;
    console.log("Saved!");
  });

    }
    // all done, close the browser
    await browser.close();

    async function autoScroll(page){
        await page.evaluate(async () => {
            await new Promise((resolve, reject) => {
                var totalHeight = 0;
                var distance = 100;
                var timer = setInterval(() => {
                    var scrollHeight = document.body.scrollHeight;
                    window.scrollBy(0, distance);
                    totalHeight += distance;
    
                    if(totalHeight >= scrollHeight){
                        clearInterval(timer);
                        resolve();
                    }
                }, 100);
            });
         
        });
    }    
    process.exit()
})();
所以它给了我信息,但随机的,我的意思是,有时它给我115页,有时300页,有时仅仅90页,我不明白为什么, 请帮帮我


谢谢。

我无法发表评论,但我想这可能与达到内存限制有关,这会减慢速度


您可以尝试在fs.appendFile(…)前面添加“await”,可能对您有用

浏览器的超时时间是多少?在启动选项中将超时设置为0,然后查看是否有任何区别。另外,在滚动之间添加一些时间,持续3-5秒,您可能会因恶意活动而被网站阻止