Javascript 刮伤<;头>;使用Node.JS?
我想用Node.JS从网页上刮去脑袋,但我不知道怎么做。感谢cheerio,我可以接触到所有的身体,如下所示:Javascript 刮伤<;头>;使用Node.JS?,javascript,node.js,web-scraping,cheerio,Javascript,Node.js,Web Scraping,Cheerio,我想用Node.JS从网页上刮去脑袋,但我不知道怎么做。感谢cheerio,我可以接触到所有的身体,如下所示: request(webUrl, function(err, resp, body){ if(!err && resp.statusCode == 200) { var $ = cheerio.load(body); //Getting all the links 'a' from the webpage $('a
request(webUrl, function(err, resp, body){
if(!err && resp.statusCode == 200) {
var $ = cheerio.load(body);
//Getting all the links 'a' from the webpage
$('a').each(function(){
//Getting the href attribute from the 'a' link
var url = $(this).attr('href');
//We keep the link only if it is the same root (in order to avoid the 'undefined' links and the subdomains or outside links (like social media links))
if(url != undefined && url[0] == '/') {
//We add the domain name to the url we got in order to have the full
url = websiteUrl + url;
urls.push(url);
}
});
console.log(urls);
}
});
但是用这种方法是不可能得到头部的。我尝试了这个方法,但它只提供了正文脚本,而没有标题中的脚本:
request(webUrl, function(err, resp, body){
if(!err && resp.statusCode == 200) {
var $ = cheerio.load(body);
$('script').each(function(){
//Getting the href attribute from the 'a' link
var url = $(this).attr('src');
console.log(url);
if(url != undefined) {
wowo.push(url);
}
});
console.log(wowo);
}
});
有人能帮我吗?:'(
您好,谢谢您的回答。不幸的是,即使我尝试使用parse5或htmlparser2,我也只能得到这样的结果:
{nodeName:'#document',mode:'no怪癖',childNodes:[{nodeName:'#documentType',name:'html',publicId:null,systemId:null,parentNode:[循环],{nodeName:'html',标记名:'html',属性:[Object],命名空间URI:'http://www.w3.org/1999/xhtml,childNodes:[Object],parentNode:[Circular]}
我没有完整的网站,只有顶部。这很奇怪