是否有我遗漏的Javascript警告？范围问题？_Javascript_Node.js_Http_Mongoose

是否有我遗漏的Javascript警告？范围问题？

javascript node.js http mongoose

是否有我遗漏的Javascript警告？范围问题？,javascript,node.js,http,mongoose,Javascript,Node.js,Http,Mongoose,我已经编写了一个JavaScript对象，它使用标准的nodeJS http get方法来获取网站列表这对我来说是相当棘手的：一个与JS相对的nube来处理所有的异步性，并将请求逐个排队，这样我的机器就不会一次发出数千个约2500个请求，而且显然会崩溃下面是这个想法，以及它目前的工作原理： var scraper = new WebsiteScraper(); mongooseDocs.forEach(function(doc){ var callback = function(

我已经编写了一个JavaScript对象，它使用标准的nodeJS http get方法来获取网站列表

这对我来说是相当棘手的：一个与JS相对的nube来处理所有的异步性，并将请求逐个排队，这样我的机器就不会一次发出数千个约2500个请求，而且显然会崩溃

下面是这个想法，以及它目前的工作原理：

var scraper = new WebsiteScraper();
mongooseDocs.forEach(function(doc){ 

    var callback = function(outputFromWebPage, done){

        var email = /*try find email*/

        if(email){
            doc.email = email;
            doc.save(function(){
                done();
            })
        }else{
            done();
        }

    }
    scraper.scrape(doc.website, callback);
})






function WebsiteScraper() {

    this.active = true;
    this.queue = [];

    /** Add url to queue */
    this.scrape = (function (url, callback) {
        //console.log(url);
        this.queue.push({url: url, callback: callback});
    }).bind(this);

    /** Execute the scrape */
    this.execute = (function (url, done) {



        if(url.indexOf("http") === -1) done();

        var protocol = /https/.test(url) !== false ? https : http;



        var req = protocol.get(url, (function (response) {

            var output = '';

            console.log("request made");

            response.on('error', function (error) {
                console.log(error);
                done(false);
            })
            response.on('data', function (chunk) {
                output += chunk;
            })
            response.on('end', (function () {
                console.log("Request complete", url);
                done(output);

            }).bind(this))
        }).bind(this));

        req.setTimeout(20000, function () {
            console.log("Timed out");
            done(false);
        });

        req.on('error', function (e) {
            console.log("Got error: " + e.message);
            done(false);
        })

        req.end();


    }).bind(this);

    this.check = (function () {

        if (this.queue[0] && this.queue[0].callback) {

            this.execute(this.queue[0].url, (function (output) {
                this.queue[0].callback(output, (function(){
                    this.queue.shift();
                    this.check();
                }).bind(this));
                console.log(this.queue.length, ' left in queue');
            }).bind(this));

        }
        else {
            setTimeout(this.check, 100);
        }


    }).bind(this);

    this.check();


}

本质上：循环记录并将其添加到列表中。只要队列中有URL，网站刮刀就会在内部遍历列表并执行请求。当请求完成时，它执行用户定义的回调，并且只有在完成时才执行下一个请求

它工作正常，可以找到电子邮件，但当它移动到下一个记录时，如果找不到电子邮件地址，它会保存以前的电子邮件。我尝试了各种不同的绑定，并在完成后删除了email变量

这已经困扰我大约一个月了，请帮助：D

嗯，scraper.scrapurltoscrape，回调；不是每个人的身体里面吗？？？我的不好。它在我的实际代码中。我可以推荐您使用它来处理大部分异步性吗？我是说，你自己试过真是太好了！这就是你学习东西的方式。：但是对于严肃的工作，我推荐它。嘿，是的，谢谢：我现在已经用过了，但是我想能够动态地将链接添加到队列的末尾，如果我在我觉得相关的页面上找到链接的话。我知道我也可以在异步中完成这项工作，但不能在一个长的动态队列中完成。除非我错了，我很可能是错的：我希望通过这样做来学习一些东西@RobertRossmann