Nutch在索引到SOLR时出现了一个混洗错误。

Nutch在索引到SOLR时出现了一个混洗错误。,solr,nutch,Solr,Nutch,Nutch爬虫成功地索引了特定时间内的文档。在某个时候,它突然停了下来,不知道原因。我张贴日志,我可以知道这一点的原因 java.lang.Exception: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in localfetcher#1 at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunn

Nutch爬虫成功地索引了特定时间内的文档。在某个时候,它突然停了下来,不知道原因。我张贴日志,我可以知道这一点的原因

java.lang.Exception: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in localfetcher#1
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in localfetcher#1
    at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
    at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
    at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63)
    at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:309)
    at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:299)
    at org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.copyMapOutput(LocalFetcher.java:134)
    at org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.doCopy(LocalFetcher.java:102)
    at org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.run(LocalFetcher.java:85)
2018-08-30 03:15:54,758 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:873)
    at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147)
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239)
java.lang.Exception:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError:localfetcher中的Shuffle错误#1
位于org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
位于org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
原因:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError:localfetcher中的Shuffle错误#1
位于org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
位于org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
位于org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
位于java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
在java.util.concurrent.FutureTask.run(FutureTask.java:266)处
位于java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
位于java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
运行(Thread.java:748)
原因:java.lang.OutOfMemoryError:java堆空间
位于org.apache.hadoop.io.BoundedByteArrayOutputStream。(BoundedByteArrayOutputStream.java:56)
位于org.apache.hadoop.io.BoundedByteArrayOutputStream。(BoundedByteArrayOutputStream.java:46)
位于org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput。(InMemoryMapOutput.java:63)
位于org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:309)
位于org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:299)
位于org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.copymappoutput(LocalFetcher.java:134)
位于org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.doCopy(LocalFetcher.java:102)
位于org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.run(LocalFetcher.java:85)
2018-08-30 03:15:54758错误indexer.IndexingJob-indexer:java.io.IOException:作业失败!
位于org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:873)
位于org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147)
位于org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230)
位于org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
位于org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239)
这是内存错误 尝试在solr.in.sh中进行调整

SOLR_JAVA_MEM="-Xms512m -Xmx5120m"
对我来说,这之后就是工作了

SOLR_JAVA_MEM="-Xms512m -Xmx5120m"