Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Nutch索引失败,java.lang.NoSuchFieldError:实例_Java_Hadoop_Web Crawler_Nutch - Fatal编程技术网

Nutch索引失败,java.lang.NoSuchFieldError:实例

Nutch索引失败,java.lang.NoSuchFieldError:实例,java,hadoop,web-crawler,nutch,Java,Hadoop,Web Crawler,Nutch,我正在使用Nutch1.13抓取数据并将其存储到elasticsearch。我还创建了一些自定义的解析过滤器和索引过滤器插件。一切正常 我将elasticsearch更新为版本5。然后,索引器弹性插件由于版本不匹配而停止工作。另外,从一些文档中我知道elasticsearch版本5只支持nutch 2+版本 但是,我坚持使用这个nutch版本,并找到了一个插件来索引elasticsearch而不是rest。在nutch中进行了更改以包含此插件 尝试了爬行和索引,它在nutch的本地模式下工作。当

我正在使用Nutch1.13抓取数据并将其存储到elasticsearch。我还创建了一些自定义的解析过滤器和索引过滤器插件。一切正常

我将elasticsearch更新为版本5。然后,
索引器弹性
插件由于版本不匹配而停止工作。另外,从一些文档中我知道elasticsearch版本5只支持nutch 2+版本

但是,我坚持使用这个nutch版本,并找到了一个插件来索引elasticsearch而不是rest。在nutch中进行了更改以包含此插件

尝试了爬行和索引,它在nutch的
本地模式下工作。当我在部署模式下尝试同样的操作时,我在索引阶段遇到了以下异常:

17/11/16 10:53:37 INFO mapreduce.Job: Running job: job_1510809462003_0010
17/11/16 10:53:44 INFO mapreduce.Job: Job job_1510809462003_0010 running in uber mode : false
17/11/16 10:53:44 INFO mapreduce.Job:  map 0% reduce 0%
17/11/16 10:53:48 INFO mapreduce.Job:  map 20% reduce 0%
17/11/16 10:53:52 INFO mapreduce.Job:  map 40% reduce 0%
17/11/16 10:53:56 INFO mapreduce.Job:  map 60% reduce 0%
17/11/16 10:53:59 INFO mapreduce.Job:  map 80% reduce 20%
17/11/16 10:54:02 INFO mapreduce.Job:  map 100% reduce 100%
17/11/16 10:54:02 INFO mapreduce.Job: Task Id : attempt_1510809462003_0010_r_000000_0, Status : FAILED
Error: INSTANCE
17/11/16 10:54:03 INFO mapreduce.Job:  map 100% reduce 0%
17/11/16 10:54:06 INFO mapreduce.Job: Task Id : attempt_1510809462003_0010_r_000000_1, Status : FAILED
Error: INSTANCE
17/11/16 10:54:10 INFO mapreduce.Job: Task Id : attempt_1510809462003_0010_r_000000_2, Status : FAILED
Error: INSTANCE
17/11/16 10:54:15 INFO mapreduce.Job:  map 100% reduce 100%
17/11/16 10:54:15 INFO mapreduce.Job: Job job_1510809462003_0010 failed with state FAILED due to: Task failed task_1510809462003_0010_r_000000
Job failed as tasks failed. failedMaps:0 failedReduces:1

17/11/16 10:54:15 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=804602
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=44204
HDFS: Number of bytes written=0
HDFS: Number of read operations=20
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters 
Failed reduce tasks=4
Killed map tasks=1
Launched map tasks=5
Launched reduce tasks=4
Data-local map tasks=5
Total time spent by all maps in occupied slots (ms)=39484
Total time spent by all reduces in occupied slots (ms)=16866
Total time spent by all map tasks (ms)=9871
Total time spent by all reduce tasks (ms)=16866
Total vcore-milliseconds taken by all map tasks=9871
Total time spent by all reduce tasks (ms)=16866
Total vcore-milliseconds taken by all map tasks=9871
Total vcore-milliseconds taken by all reduce tasks=16866
Total megabyte-milliseconds taken by all map tasks=40431616
Total megabyte-milliseconds taken by all reduce tasks=17270784
Map-Reduce Framework
Map input records=436
Map output records=436
Map output bytes=55396
Map output materialized bytes=56302
Input split bytes=698
Combine input records=0
Spilled Records=436
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=246
CPU time spent (ms)=3840
Physical memory (bytes) snapshot=1559916544
Virtual memory (bytes) snapshot=25255698432
Total committed heap usage (bytes)=1503657984
File Input Format Counters 
Bytes Read=43506
17/11/16 10:54:15 ERROR impl.JobWorker: Cannot run job worker!
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:94)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:87)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:352)
at org.apache.nutch.service.impl.JobWorker.run(JobWorker.java:71)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Hadoop日志是:

2017-11-16 10:54:13,731 INFO [main] org.apache.nutch.indexer.IndexWriters: Adding org.apache.nutch.indexwriter.elasticrest.ElasticRestIndexWriter
2017-11-16 10:54:13,801 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchFieldError: INSTANCE
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.<clinit>(SSLConnectionSocketFactory.java:144)
    at org.apache.nutch.indexwriter.elasticrest.ElasticRestIndexWriter.open(ElasticRestIndexWriter.java:133)
    at org.apache.nutch.indexer.IndexWriters.open(IndexWriters.java:75)
    at org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:39)
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:484)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:414)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2017-11-16 10:54:13731信息[main]org.apache.nutch.indexer.IndexWriters:Adding org.apache.nutch.indexwriter.elasticrest.elasticrestinexwriter
2017-11-16 10:54:13801致命[main]org.apache.hadoop.mapred.YarnChild:运行child:java.lang.NoSuchFieldError:实例时出错
位于org.apache.http.conn.ssl.SSLConnectionSocketFactory(SSLConnectionSocketFactory.java:144)
位于org.apache.nutch.indexwriter.elasticrest.elasticrestinexwriter.open(elasticrestinexwriter.java:133)
位于org.apache.nutch.indexer.IndexWriters.open(IndexWriters.java:75)
位于org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:39)
位于org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter。(ReduceTask.java:484)
位于org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:414)
位于org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
位于org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
位于java.security.AccessController.doPrivileged(本机方法)
位于javax.security.auth.Subject.doAs(Subject.java:422)
位于org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
位于org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
在搜索了这个之后,我知道这是由于http JAR的一些版本问题造成的。我使用的hadoop版本是
2.7.2
。我用hadoop版本
2.8.2
尝试了同样的方法,结果是一样的

寻找解决方案

已解决:问题在于hadoop
2.7.2
中http核心的较旧jar版本。拆下那些罐子,解决了问题