org.apache.nutch.crawl.Crawler NPE位于org.apache.avro.util.Utf8;初始化>;(Utf8.java:37)

org.apache.nutch.crawl.Crawler NPE位于org.apache.avro.util.Utf8;初始化>;(Utf8.java:37),java,cygwin,nutch,Java,Cygwin,Nutch,当我试图使用eclipse启动器运行org.apache.nutch.Crawler.Crawler类时,出现以下异常。我对此一无所知 java.lang.NullPointerException at org.apache.avro.util.Utf8.<init>(Utf8.java:37) at org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100) at org.

当我试图使用eclipse启动器运行org.apache.nutch.Crawler.Crawler类时,出现以下异常。我对此一无所知

java.lang.NullPointerException
    at org.apache.avro.util.Utf8.<init>(Utf8.java:37)
    at org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
13/07/30 21:14:26 INFO mapred.JobClient:  map 100% reduce 0%
13/07/30 21:14:26 INFO mapred.JobClient: Job complete: job_local_0002
13/07/30 21:14:26 INFO mapred.JobClient: Counters: 12
13/07/30 21:14:26 INFO mapred.JobClient:   FileSystemCounters
13/07/30 21:14:26 INFO mapred.JobClient:     FILE_BYTES_READ=47606
13/07/30 21:14:26 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=97164
13/07/30 21:14:26 INFO mapred.JobClient:   Map-Reduce Framework
13/07/30 21:14:26 INFO mapred.JobClient:     Reduce input groups=0
13/07/30 21:14:26 INFO mapred.JobClient:     Combine output records=0
13/07/30 21:14:26 INFO mapred.JobClient:     Map input records=0
13/07/30 21:14:26 INFO mapred.JobClient:     Reduce shuffle bytes=0
13/07/30 21:14:26 INFO mapred.JobClient:     Reduce output records=0
13/07/30 21:14:26 INFO mapred.JobClient:     Spilled Records=0
13/07/30 21:14:26 INFO mapred.JobClient:     Map output bytes=0
13/07/30 21:14:26 INFO mapred.JobClient:     Combine input records=0
13/07/30 21:14:26 INFO mapred.JobClient:     Map output records=0
13/07/30 21:14:26 INFO mapred.JobClient:     Reduce input records=0
Exception in thread "main" java.lang.RuntimeException: job failed: name=generate: null, jobid=null
    at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
    at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:199)
    at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
    at org.apache.nutch.crawl.Crawler.run(Crawler.java:152)
    at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
java.lang.NullPointerException
在org.apache.avro.util.Utf8.(Utf8.java:37)
位于org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100)
位于org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
位于org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
位于org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
位于org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
13/07/30 21:14:26信息映射。作业客户端:映射100%减少0%
13/07/30 21:14:26信息映射。作业客户端:作业完成:作业\u本地\u 0002
13/07/30 21:14:26信息映射。作业客户端:计数器:12
13/07/30 21:14:26信息映射。作业客户端:文件系统计数器
13/07/30 21:14:26 INFO mapred.JobClient:FILE_BYTES_READ=47606
13/07/30 21:14:26 INFO mapred.JobClient:FILE_BYTES_write=97164
13/07/30 21:14:26信息映射。作业客户端:映射简化框架
13/07/30 21:14:26信息映射。作业客户端:减少输入组=0
13/07/30 21:14:26信息映射。作业客户端:合并输出记录=0
13/07/30 21:14:26信息映射。作业客户端:映射输入记录=0
13/07/30 21:14:26 INFO mapred.JobClient:Reduce shuffle bytes=0
13/07/30 21:14:26信息映射。作业客户端:减少输出记录=0
13/07/30 21:14:26信息映射。作业客户端:溢出的记录=0
13/07/30 21:14:26信息映射。作业客户端:映射输出字节=0
13/07/30 21:14:26信息映射。作业客户端:合并输入记录=0
13/07/30 21:14:26信息映射。作业客户端:映射输出记录=0
13/07/30 21:14:26信息映射。作业客户端:减少输入记录=0
线程“main”java.lang.RuntimeException中的异常:作业失败:name=generate:null,jobid=null
位于org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
位于org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:199)
位于org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
位于org.apache.nutch.crawl.Crawler.run(Crawler.java:152)
位于org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
位于org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
位于org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
在做了一些工作之后,google遇到了(提到的类在Nutch2.x中被弃用,而不是使用$NutchHome/src/bin/crawl脚本)。甚至我也尝试过从cygwin终端运行爬网脚本,但没有成功。终端错误的屏幕截图


您应该将文件$NutchHome/src/bin/crawl复制到部署目录:$NutchHome/runtime/deploy/bin,然后运行爬网命令脚本:

爬网

希望这有帮助