将Solr与Nutch问题结合起来
我正在跟随一个来自的教程。我已经分别安装了solr和nutch,它们都工作正常。当我必须集成它们时,问题就来了。从这个站点上的早期帖子中,我了解到模式文件可能存在一些问题。正如tut中提到的,我将nutch的schema.xml复制到solr的schema.xml,并重新启动了solr。由于配置问题,solr已停止。因此,我只是将每个文件的内容与现有内容一起复制到另一个文件中。现在(以及之前)我得到了这个错误:将Solr与Nutch问题结合起来,solr,nutch,Solr,Nutch,我正在跟随一个来自的教程。我已经分别安装了solr和nutch,它们都工作正常。当我必须集成它们时,问题就来了。从这个站点上的早期帖子中,我了解到模式文件可能存在一些问题。正如tut中提到的,我将nutch的schema.xml复制到solr的schema.xml,并重新启动了solr。由于配置问题,solr已停止。因此,我只是将每个文件的内容与现有内容一起复制到另一个文件中。现在(以及之前)我得到了这个错误: Indexer: starting at 2014-08-05 11:10:21 I
Indexer: starting at 2014-08-05 11:10:21
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
Active IndexWriters :
SOLRIndexWriter
solr.server.url : URL of the SOLR instance (mandatory)
solr.commit.size : buffer size when sending to SOLR (default 1000)
solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
solr.auth : use authentication (default false)
solr.auth.username : use authentication (default false)
solr.auth : username for authentication
solr.auth.password : password for authentication
Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
有人能建议应该做什么吗?
我使用的是apache-nutch-1.8和solr-4.9.0
下面是我的hadoop.log文件的外观:
2014-08-05 12:50:05,032 INFO crawl.Injector - Injector: starting at 2014-08-05 12:50:05
2014-08-05 12:50:05,033 INFO crawl.Injector - Injector: crawlDb: -dir/crawldb
2014-08-05 12:50:05,033 INFO crawl.Injector - Injector: urlDir: urls
.
.
.
.
.
2014-08-05 13:04:21,255 INFO solr.SolrIndexWriter - Indexing 1 documents
2014-08-05 13:04:21,286 WARN mapred.LocalJobRunner - job_local1310160376_0001
org.apache.solr.common.SolrException: Bad Request
Bad Request
request: http://my-solr-url:8983/solr/update?wt=javabin&version=2
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155)
at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:467)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:535)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
2014-08-05 13:04:21,544 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
2014-08-05 13:10:37,855 INFO crawl.Injector - Injector: starting at 2014-08-05 13:10:37
.
.
.
可能是由于某些版本的差异,教程建议复制conf/schema.xml,而在这个特定版本的solr中,应该复制文件schema-solr4.xml,然后在第351行中添加:
。通过java-jar start.jar
重新启动solr,一切正常!希望这对别人有帮助 谢谢@JayeshBhoyar,我已经添加了日志。如果你真的能帮上忙,那就太好了!