Javascript 噩梦.js web抓取在服务器上不工作

Javascript 噩梦.js web抓取在服务器上不工作,javascript,node.js,web-scraping,server,nightmare,Javascript,Node.js,Web Scraping,Server,Nightmare,对于我的(开源)Node.js项目,我需要实现一个搜索和下载URL的web刮板。目前,我使用的是: exports.scrap=函数(cb){ _回调=cb _下载链接=0 让噩梦=新的噩梦({show:false}) 常量url=https://www.bundestag.de/services/opendata' //我们请求噩梦浏览到bundestag.de url并提取整个内部html 噩梦 .goto(url) .wait(‘body’) .evaluate(()=>document.

对于我的(开源)Node.js项目,我需要实现一个搜索和下载URL的web刮板。目前,我使用的是:

exports.scrap=函数(cb){
_回调=cb
_下载链接=0
让噩梦=新的噩梦({show:false})
常量url=https://www.bundestag.de/services/opendata'
//我们请求噩梦浏览到bundestag.de url并提取整个内部html
噩梦
.goto(url)
.wait(‘body’)
.evaluate(()=>document.querySelector('body').innerHTML)
(完)
。然后(响应=>{
_下载链接=0
让validLinks=extractLinks(响应)
_foundLinks=validLinks.length
logger.info(“[scraper]找到”+validLinks.length+“有效链接”)
如果(validLinks.length>0){
validLinks.forEach(href=>{
下载文件fromhref(BT_链接+href)
});
}否则{
logger.info(“[scraper]未下载任何文件。”)
_回调函数()
}  
}).catch(错误=>{
logger.info(“[scraper]未下载任何文件。”)
_回调函数()
});
//提取我们需要的链接
让extractLinks=html=>{
数据=[];
const$=cheerio.load(html);
$('.bt link dokument')。每个(函数(){
data.push(this.attribs.href);
});
返回数据.过滤器(checkDocumentLink)
} 
}
这是完美的工作时,在我的本地机器上运行。然而,在我的ubuntu服务器(AWS)上运行它时,似乎有一个问题。我读到这是因为我的服务器上没有可用的图形界面,所以我尝试在它上运行

这是我的档案

运行pm2 ls时,我可以看到Xvfb和我的服务器都在运行:

ubuntu@ip-XXX-XX-XX-XXX:~/bundeszirkus-server/current$ pm2 ls
┌─────────────────────┬────┬─────────┬──────┬───────┬────────┬─────────┬────────┬─────┬────────────┬────────┬──────────┐
│ App name            │ id │ version │ mode │ pid   │ status │ restart │ uptime │ cpu │ mem        │ user   │ watching │
├─────────────────────┼────┼─────────┼──────┼───────┼────────┼─────────┼────────┼─────┼────────────┼────────┼──────────┤
│ Xvfb                │ 1  │ N/A     │ fork │ 26063 │ online │ 6       │ 14m    │ 0%  │ 17.5 MB    │ ubuntu │ disabled │
│ bundeszirkus-server │ 0  │ 1.0.0   │ fork │ 26057 │ online │ 6       │ 14m    │ 0%  │ 246.4 MB   │ ubuntu │ disabled │
└─────────────────────┴────┴─────────┴──────┴───────┴────────┴─────────┴────────┴─────┴────────────┴────────┴──────────┘
 Use `pm2 show <id|name>` to get more details about an app
同时,它在我的本地(Ubuntu)机器上运行时也能工作:

{"message":"Starting server!","level":"info","timestamp":"2020-01-11 12:52:47"}
{"message":"Starting initial scraping.","level":"info","timestamp":"2020-01-11 12:52:47"}
{"message":"[scraper] found 5 valid links.","level":"info","timestamp":"2020-01-11 12:52:49"}
{"message":"[scraper] downloading file: 19138-data.xml from href: http://www.bundestag.de/resource/blob/674998/86249f57e79b8308e820d6581e7e2a95/19138-data.xml","level":"info","timestamp":"2020-01-11 12:52:49"}
{"message":"[scraper] downloading file: 19136-data.xml from href: http://www.bundestag.de/resource/blob/674328/0e9d258d50d08923fe6d6ad1381bdb3f/19136-data.xml","level":"info","timestamp":"2020-01-11 12:52:49"}
{"message":"[scraper] downloading file: 19137-data.xml from href: http://www.bundestag.de/resource/blob/674730/2bc751b619488227c9267e3cbe12c4c3/19137-data.xml","level":"info","timestamp":"2020-01-11 12:52:49"}
{"message":"[scraper] downloading file: 19135-data.xml from href: http://www.bundestag.de/resource/blob/673576/147b80c74d6d681833568cfcf36f9670/19135-data.xml","level":"info","timestamp":"2020-01-11 12:52:49"}
{"message":"[scraper] downloading file: 19134-data.xml from href: http://www.bundestag.de/resource/blob/673116/982f9d0ec845b85bddd289ede4a589fd/19134-data.xml","level":"info","timestamp":"2020-01-11 12:52:49"}
{"message":"[scraper] finished downloading  all 5 files.","level":"info","timestamp":"2020-01-11 12:52:51"}
{"message":"Loading data.","level":"info","timestamp":"2020-01-11 12:52:51"}

我在这里有点迷茫,不知道如何寻找丢失的那块。非常感谢您的帮助

执行以下操作后,它现在可以工作:

  • 向代码中添加
    xvfb
    ,如下所示:
  • 更改此行:
    .wait('body')
    .wait(2000)
    {"message":"Starting server!","level":"info","timestamp":"2020-01-11 12:52:47"}
    {"message":"Starting initial scraping.","level":"info","timestamp":"2020-01-11 12:52:47"}
    {"message":"[scraper] found 5 valid links.","level":"info","timestamp":"2020-01-11 12:52:49"}
    {"message":"[scraper] downloading file: 19138-data.xml from href: http://www.bundestag.de/resource/blob/674998/86249f57e79b8308e820d6581e7e2a95/19138-data.xml","level":"info","timestamp":"2020-01-11 12:52:49"}
    {"message":"[scraper] downloading file: 19136-data.xml from href: http://www.bundestag.de/resource/blob/674328/0e9d258d50d08923fe6d6ad1381bdb3f/19136-data.xml","level":"info","timestamp":"2020-01-11 12:52:49"}
    {"message":"[scraper] downloading file: 19137-data.xml from href: http://www.bundestag.de/resource/blob/674730/2bc751b619488227c9267e3cbe12c4c3/19137-data.xml","level":"info","timestamp":"2020-01-11 12:52:49"}
    {"message":"[scraper] downloading file: 19135-data.xml from href: http://www.bundestag.de/resource/blob/673576/147b80c74d6d681833568cfcf36f9670/19135-data.xml","level":"info","timestamp":"2020-01-11 12:52:49"}
    {"message":"[scraper] downloading file: 19134-data.xml from href: http://www.bundestag.de/resource/blob/673116/982f9d0ec845b85bddd289ede4a589fd/19134-data.xml","level":"info","timestamp":"2020-01-11 12:52:49"}
    {"message":"[scraper] finished downloading  all 5 files.","level":"info","timestamp":"2020-01-11 12:52:51"}
    {"message":"Loading data.","level":"info","timestamp":"2020-01-11 12:52:51"}
    
    let xvfb = new Xvfb();
    try {
      xvfb.startSync();
    }
    catch (e) {
      console.log(e);
    }
    // scraping
    xvfb.stopSync();