Indexing nutch索引失败,io异常

Indexing nutch索引失败,io异常,indexing,nutch,Indexing,Nutch,运行以下命令时,Nutch索引失败: root@ubuntu:/home/test-tb/Downloads/apache-nutch-1.10# bin/nutch index mycrl/crawldb/ -dir mycrl/segments/ 我正在ubuntu 12.04 LTS上使用nutch 1.10 错误日志详细信息如下: 2015-07-09 17:07:36,940 INFO indexer.IndexWriters - Adding org.apache.nutc

运行以下命令时,Nutch索引失败:

root@ubuntu:/home/test-tb/Downloads/apache-nutch-1.10# bin/nutch index mycrl/crawldb/ -dir mycrl/segments/
我正在ubuntu 12.04 LTS上使用nutch 1.10

错误日志详细信息如下:

2015-07-09 17:07:36,940 INFO  indexer.IndexWriters - Adding    org.apache.nutch.indexwriter.solr.SolrIndexWriter
2015-07-09 17:07:36,970 INFO  solr.SolrMappingReader - source: content dest: content
2015-07-09 17:07:36,970 INFO  solr.SolrMappingReader - source: title dest: title
2015-07-09 17:07:36,970 INFO  solr.SolrMappingReader - source: host dest: host
2015-07-09 17:07:36,970 INFO  solr.SolrMappingReader - source: segment dest: segment
2015-07-09 17:07:36,970 INFO  solr.SolrMappingReader - source: boost dest: boost
2015-07-09 17:07:36,970 INFO  solr.SolrMappingReader - source: digest dest: digest
2015-07-09 17:07:36,970 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp
2015-07-09 17:07:37,030 INFO  solr.SolrIndexWriter - Indexing 100 documents
2015-07-09 17:07:37,136 INFO  solr.SolrIndexWriter - Indexing 100 documents
2015-07-09 17:07:37,166 WARN  mapred.LocalJobRunner - job_local1383488781_0001
org.apache.solr.common.SolrException: Not Found

Not Found

request: http://127.0.0.1:8983/solr/update?wt=javabin&version=2
at  org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:153)
at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115)
at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:467)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:535)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)

2015-07-09 17:07:37,957 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:113)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:177)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:187)

虽然我没有为nutch指定solr索引选项,但返回了此错误。我有什么遗漏吗?你的指点会很有帮助的。提前感谢。

首先,如果您正在对数据进行爬网和索引,那么您应该使用
bin/crawl
,因为它是一个更好的工具

其次,从堆栈跟踪来看,很明显您没有正确设置solr url。通常,您的solr url应该类似于
http://domainname:port/solr/corename


但是,我看到您有
localhost:8983/solr/update
。因此,您的url缺少solr的核心名称。默认情况下,它是collection1。

谢谢ameer,但我遇到了另一个奇怪的问题。当我给出命令“bin/crawl--index-dsolr.server.url=url/ts/2”时,我得到了异常“错误的请求请求:在org.apache.solr.client.solrj.impl.commonHttpSolrServer.Request(commonHttpSolrServer.java:430)在org.apache.solr.client.solrj.impl.commonHttpSolrServer.Request(commonHttpSolrServer.java:244)”但是,当我使用numrounds作为1运行命令时,它会毫无错误地执行。有什么帮助吗?solr正在运行吗?你能详细说明一下你想做什么吗?是的,我在跑步。我想做的是测试nutch和solr来设置一个简单的搜索引擎。为了测试它,我只是将其作为种子文件,并按照网站上的说明进行操作。如果这是成功的,我想尝试同样的内联网URL。你能在nutch/logs/hadoop.log或solr logs上显示错误跟踪吗。我想您还没有将模式文件从nutch/conf/schema.xml复制到solr/collection1/schema.xml。。我将schema.xml复制到solr/corename/conf/目录中。。还是一样的问题。。但我检查了solr日志,发现以下问题: