Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/ruby-on-rails/65.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/solr/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Ruby on rails 工作失败了_Ruby On Rails_Solr_Web Crawler_Nutch - Fatal编程技术网

Ruby on rails 工作失败了

Ruby on rails 工作失败了,ruby-on-rails,solr,web-crawler,nutch,Ruby On Rails,Solr,Web Crawler,Nutch,我在运行nutch进行注射时遇到问题 下面是我正在运行的命令 bin/nutch注入bin/crawl/crawldb bin/url 在运行上述命令后,会出现以下错误 Injector: starting at 2014-04-02 13:02:29 Injector: crawlDb: bin/crawl/crawldb Injector: urlDir: bin/urls/seed.txt Injector: Converting injected urls to crawl db ent

我在运行nutch进行注射时遇到问题 下面是我正在运行的命令

bin/nutch注入bin/crawl/crawldb bin/url

在运行上述命令后,会出现以下错误

Injector: starting at 2014-04-02 13:02:29
Injector: crawlDb: bin/crawl/crawldb
Injector: urlDir: bin/urls/seed.txt
Injector: Converting injected urls to crawl db entries.
Injector: total number of urls rejected by filters: 2
Injector: total number of urls injected after normalization and filtering: 0
Injector: Merging injected urls into crawl db.
Injector: overwrite: false
Injector: update: false
Injector: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:294)
    at org.apache.nutch.crawl.Injector.run(Injector.java:316)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.crawl.Injector.main(Injector.java:306)
我是第一次跑步。 我已检查solr、nutch是否安装正确

以下详细信息来自日志文件

java.io.IOException: The temporary job-output directory file:/usr/share/apache-nutch-1.8/bin/crawl/crawldb/1639805438/_temporary doesn't exist!
    at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
    at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
    at org.apache.hadoop.mapred.MapFileOutputFormat.getRecordWriter(MapFileOutputFormat.java:46)
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:449)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:491)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
2014-04-02 12:54:46,251 ERROR crawl.Injector - Injector: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:294)
    at org.apache.nutch.crawl.Injector.run(Injector.java:316)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.crawl.Injector.main(Injector.java:306)
java.io.IOException:临时作业输出目录文件:/usr/share/apache-nutch-1.8/bin/crawl/crawldb/1639805438//u temporary不存在!
位于org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
位于org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
位于org.apache.hadoop.mapred.MapFileOutputFormat.getRecordWriter(MapFileOutputFormat.java:46)
位于org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter。(ReduceTask.java:449)
位于org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:491)
位于org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
位于org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
2014-04-02 12:54:46251错误爬网。注入器-注入器:java.io.IOException:作业失败!
位于org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
位于org.apache.nutch.crawl.Injector.Injector(Injector.java:294)
位于org.apache.nutch.crawl.Injector.run(Injector.java:316)
位于org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
位于org.apache.nutch.crawl.Injector.main(Injector.java:306)

正在使用bin/nutch-inject-bin/crawl/crawldb-bin/url命令进行注入

而不是bin/nutch注入爬网/crawldb bin/url

这就解决了错误


对于获取URL,我已经对regex-urlfilter.txt文件进行了更改,现在我可以获取URL。

确保您的任何nutch配置文件中没有任何语法错误

根据您的日志,您在权限方面有问题。很可能此作业没有权限在/usr/..@mistryon内创建文件夹。谢谢您的回复。正如您所建议的,我已更改权限。但仍然收到相同的错误。解决了上述错误。但nutch没有从种子文件获取URL。有人能帮上忙吗?您如何解决?请更新问题