Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/javascript/361.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Javascript 如何让node.js函数返回值_Javascript_Node.js_Callback - Fatal编程技术网

Javascript 如何让node.js函数返回值

Javascript 如何让node.js函数返回值,javascript,node.js,callback,Javascript,Node.js,Callback,以下代码是对的修改。 它基本上获取一些html并打印链接列表,并将它们存储在变量中: crawl = function(host) var select = require('soupselect').select, htmlparser = require("htmlparser"), http = require('http'), sys = require('sys'); // fetch some HTML...

以下代码是对的修改。 它基本上获取一些html并打印链接列表,并将它们存储在变量中:

crawl = function(host)
    var select = require('soupselect').select,
        htmlparser = require("htmlparser"),
        http = require('http'),
        sys = require('sys');

    // fetch some HTML...
    var http = require('http');
    var client = http.createClient(80, host);
    var request = client.request('GET', '/',{'host': host});

    var newPages = []

    request.on('response', function (response) {
        response.setEncoding('utf8');

        var body = "";
        response.on('data', function (chunk) {
            body = body + chunk;
        });

        response.on('end', function() {

            // now we have the whole body, parse it and select the nodes we want...
            var handler = new htmlparser.DefaultHandler(function(err, dom) {
                if (err) {
                    sys.debug("Error: " + err);
                } else {

                    // soupselect happening here...
                    var titles = select(dom, 'a.title');

                    sys.puts("Top stories from reddit");
                    titles.forEach(function(title) {
                        sys.puts("- " + title.children[0].raw + " [" + title.attribs.href + "]\n");
                        newPages.push(title.attribs.href);
                    })
                }
            });

            var parser = new htmlparser.Parser(handler);
            parser.parseComplete(body);
        });
    });
    request.end();
}
我真正想要的是这个函数返回
newPages
我想说
newPages=crawl(host)
;问题是我不确定这是否有意义,或者把return语句放在哪里。我看到newPages在请求结束前存在,但在请求结束后为空


如何使该函数的返回值为
newPages

Felix是正确的,您不能。这是你能得到的最接近的结果:

将函数签名更改为

crawl = function(host, done)
并将您的函数体更新为:

titles.forEach(function(title) {
                        sys.puts("- " + title.children[0].raw + " [" + title.attribs.href + "]\n");
                        newPages.push(title.attribs.href);
                        done(newPages);
                    })
然后您可以这样调用爬网:

var processNewPages = function(pages){
// do something with pages here
...
};

crawl(host, processNewPages);

我喜欢使用
request
cheerio
async
模块来抓取网站。这段代码更短,我认为可读性更强

var request = require('request');
var cheerio = require('cheerio');
var async   = require('async');

function crawl(url, contentSelector, linkSelector, callback) {
    var results = [];
    var visited = {};

    var queue = async.queue(crawlPage, 5); // crawl 5 pages at a time
    queue.drain = callback; // will be called when finished

    function crawlPage(url, done) {
        // make sure to visit each page only once
        if (visited[url]) return done(); else visited[url] = true;

        request(url, function(err, response, body) {
            if (!err) {
                var $ = cheerio.load(body); // "jQuery"
                results = results.concat(contentSelector($)); // add something to the results
                queue.push(linkSelector($)); // add links found on this page to the queue
            }
            done();
        });
    }
}

function getStoryTitles($) {
    return $('a.title').map(function() { return $(this).text(); });
}

function getStoryLinks($) {
    return $('a.title').map(function() { return $(this).attr('href'); });
}

crawl('http://www.reddit.com', getStoryTitles, getStoryLinks, function(stories) {
    console.log(stories); // all stories!
});
最后你会得到一个你可能首先想要的所有故事的数组,它只是一个不同的语法。您可以像AndyD建议的那样,更新您的函数,使其行为类似

将来,您将能够使用生成器,这将使您无需回调函数即可获得故事,而回调函数更像您想要的。有关更多详细信息,请参阅

function* crawl(url) {
    // do stuff
    yield story;
}

var crawler = crawl('http://www.reddit.com');
var firstStory = crawler.next();
var secondStory = crawler.next();
// ...

你不能。如果可以,就不需要回调。试图解释同步代码和异步代码之间的区别。即使它关注Ajax,解决方案也适用于异步代码执行的任何情况。其中N是一个固定数。我试图避免递归。可能吗?如何通过processNewPages运行爬网N次而不出错?请参阅下面的mak代码,它使用队列,不会递归。非常好。特别是async.queue的使用。在我看来,如果你做Node.js的工作,你需要知道async.js。我以前使用过Cheerio、async和request,这是一个很好的组合。