Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/solr/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
运行solrindexer时出错_Solr_Nutch - Fatal编程技术网

运行solrindexer时出错

运行solrindexer时出错,solr,nutch,Solr,Nutch,现在我使用3.6.1和nutch 1.5,它工作得很好…我抓取我的网站,将数据索引到solr中,并使用solr搜索,但两周前它还没有开始工作。。。 当我使用/nutch crawl URL时-solrhttp://localhost:8080/solr/-depth 5-topN 100命令这是有效的,但是当我使用/nutch crawl url-solrhttp://localhost:8080/solr/-depth 5-topN 100000,它抛出了一个异常,在我的日志文件中我发现了这个

现在我使用3.6.1和nutch 1.5,它工作得很好…我抓取我的网站,将数据索引到solr中,并使用solr搜索,但两周前它还没有开始工作。。。 当我使用/nutch crawl URL时-solr
http://localhost:8080/solr/
-depth 5-topN 100命令这是有效的,但是当我使用/nutch crawl url-solr
http://localhost:8080/solr/
-depth 5-topN 100000,它抛出了一个异常,在我的日志文件中我发现了这个

2013-02-05 17:04:20,697 INFO  solr.SolrWriter - Indexing 250 documents
2013-02-05 17:04:20,697 INFO  solr.SolrWriter - Deleting 0 documents
2013-02-05 17:04:21,275 WARN  mapred.LocalJobRunner - job_local_0029
org.apache.solr.common.SolrException: Internal Server Error

Internal Server Error

request: `http://localhost:8080/solr/update?wt=javabin&version=2`
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
    at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
    at org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:124)
    at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:55)
    at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:44)
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:457)
    at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:497)
    at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:195)
    at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:51)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
2013-02-05 17:04:21,883 ERROR solr.SolrIndexer - java.io.IOException: Job failed!
2013-02-05 17:04:21,887 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: starting at 2013-02-05 17:04:21
2013-02-05 17:04:21,887 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url: `http://localhost:8080/solr/`    
两周前,它工作得很好。。。 有人遇到过类似的问题吗

嗨,我刚完成爬网,遇到了同样的异常,但是当我查看我的log/hadoop.log文件时,我发现了这个

    2013-02-06 22:02:14,111 INFO  solr.SolrWriter - Indexing 250 documents
2013-02-06 22:02:14,111 INFO  solr.SolrWriter - Deleting 0 documents
2013-02-06 22:02:14,902 WARN  mapred.LocalJobRunner - job_local_0019
org.apache.solr.common.SolrException: Bad Request

Bad Request

request: `http://localhost:8080/solr/update?wt=javabin&version=2`
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
    at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
    at org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:124)
    at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:55)
    at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:44)
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:457)
    at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:497)
    at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:304)
    at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
2013-02-06 22:02:15,027 ERROR solr.SolrIndexer - java.io.IOException: Job failed!
2013-02-06 22:02:15,032 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: starting at 2013-02-06 22:02:15
2013-02-06 22:02:15,032 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url: `http://localhost:8080/solr/`
2013-02-06 22:02:21,281 WARN  mapred.FileOutputCommitter - Output path is null in cleanup
2013-02-06 22:02:22,263 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: finished at 2013-02-06 22:02:22, elapsed: 00:00:07
2013-02-06 22:02:22,263 INFO  crawl.Crawl - crawl finished: crawl-20130206205733 

我希望这将有助于理解这个问题……

根据您展示的日志,我认为答案将是Solr方面的。您应该有一个异常跟踪,它将告诉您哪个组件停止了处理。如果它在两周前起作用,要么发生了变化(jar版本?),要么您有一个特定的文档存在问题

如果问题发生在单个文档上(尝试几个不同的文档),那么您可能会对某些环境(JAR、属性等)进行更改。如果它不是发生在文档的一个子集上,而是发生在另一个子集上,则特定文档可能存在问题(例如,编码错误)


同样,首先要检查的是Solr端堆栈跟踪。

似乎在map reduce作业中失败了,您可以检查hadoop日志以了解更多详细信息。我编辑了答案并添加了日志文件的最后一部分…感谢您的回答。Solr可能收到了格式错误的请求。您将在Solr日志中获得错误请求的详细信息和问题。您好,感谢您的回复。。。我试图抓取另一个网页,例如
www.woodgears.ca
和另外两个网页,结果是一样的,同样的例外…我认为它与数据无关,现在我使用nutch 1.6和solr 3.6.2,在ubuntu中我安装了tomcat6。。。??