批量加载期间的Solr错误-org.apache.lucene.index.MergePolicy$MergeException

批量加载期间的Solr错误-org.apache.lucene.index.MergePolicy$MergeException,solr,datastax-enterprise,datastax,Solr,Datastax Enterprise,Datastax,在通过sstableloader批量加载数百万条记录时,我遇到了很多以下异常: ERROR [Lucene Merge Thread #132642] 2014-07-29 00:35:01,252 CassandraDaemon.java (line 199) Exception in thread Thread[Lucene Merge Thread #132642,6,main] org.apache.lucene.index.MergePolicy$MergeException: jav

在通过sstableloader批量加载数百万条记录时,我遇到了很多以下异常:

ERROR [Lucene Merge Thread #132642] 2014-07-29 00:35:01,252 CassandraDaemon.java (line 199) Exception in thread Thread[Lucene Merge Thread #132642,6,main]
org.apache.lucene.index.MergePolicy$MergeException: java.lang.IllegalStateException: failed
        at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545)
        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
Caused by: java.lang.IllegalStateException: failed
        at org.apache.lucene.util.packed.DirectPackedReader.get(DirectPackedReader.java:93)
        at org.apache.lucene.util.packed.BlockPackedReader.get(BlockPackedReader.java:86)
        at org.apache.lucene.util.LongValues.get(LongValues.java:35)
        at org.apache.lucene.codecs.lucene45.Lucene45DocValuesProducer$5.getOrd(Lucene45DocValuesProducer.java:459)
        at org.apache.lucene.codecs.DocValuesConsumer$4$1.setNext(DocValuesConsumer.java:389)
        at org.apache.lucene.codecs.DocValuesConsumer$4$1.hasNext(DocValuesConsumer.java:352)
        at org.apache.lucene.codecs.lucene45.Lucene45DocValuesConsumer.addNumericField(Lucene45DocValuesConsumer.java:141)
        at org.apache.lucene.codecs.lucene45.Lucene45DocValuesConsumer.addSortedField(Lucene45DocValuesConsumer.java:350)
        at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addSortedField(PerFieldDocValuesFormat.java:116)
        at org.apache.lucene.codecs.DocValuesConsumer.mergeSortedField(DocValuesConsumer.java:305)
        at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:197)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:116)
        at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4058)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3655)
        at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
Caused by: java.io.EOFException: Read past EOF (resource: BBIndexInput(name=_13ms5_Lucene45_0.dvd))
        at com.datastax.bdp.search.lucene.store.bytebuffer.ByteBufferIndexInput.switchCurrentBuffer(ByteBufferIndexInput.java:188)
        at com.datastax.bdp.search.lucene.store.bytebuffer.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:129)
        at org.apache.lucene.store.DataInput.readShort(DataInput.java:77)
        at com.datastax.bdp.search.lucene.store.bytebuffer.ByteBufferIndexInput.readShort(ByteBufferIndexInput.java:89)
        at org.apache.lucene.util.packed.DirectPackedReader.get(DirectPackedReader.java:64)
        ... 15 more
我从异常跟踪中看到,它与长值和EOF有关。但是,我不知道是什么触发了错误。我试图导入的SSTable文件是由一个使用org.apache.cassandra.io.SSTable.CQLSSTableWriter的Java程序(由我编写)生成的

CF模式、Solr模式和SSTable生成器代码可在此处找到:

附言:

  • 我最近从DSE 4.1.3升级到了4.5.1。我不记得升级之前看到过这个错误
  • 生成器的类路径中包含的cassandra库是版本2.0.8。在DSE升级之前,它使用版本2.0.5库
  • DSE拓扑:1个DC,6个solr节点(禁用vnodes),RF 2
  • 其他DSE配置:LeveledCompression、LZ4压缩、八卦属性FileSnitch

  • 机器规格:CentOS 6.5 x64、JDK 1.7.0_55、hexacore、堆大小120gb(我们有特定的查询要求)、128gb总ram
我最初在6个节点中的3个节点中遇到了错误。我重新启动了所有这些程序,并且能够导入1.5亿多条记录,没有任何错误。但是,当我在睡眠时让导入处于无人值守状态时,6个节点中有1个重新出现了错误

我现在非常担心,因为每个节点中的索引记录数(根据Solr admin UI)比Cassandra行数(根据nodetool cfstats)少了大约60000条记录

更新:

仍在继续经历这一点。索引文档(Solr)和存储文档(cassandracfstats)的数量之间的差异日益增大

更新(2014-08-13):

按照Rock Brain的建议更改目录工厂;但在通过sstableloader连续导入后的几个小时内,错误再次发生

更新(2014-08-14):

有趣的是,我注意到我实际上得到了两个类似的异常(区别只是最后一个“原因”的堆栈跟踪):

例外情况1:

ERROR [Lucene Merge Thread #24937] 2014-08-14 06:20:32,270 CassandraDaemon.java (line 199) Exception in thread Thread[Lucene Merge Thread #24937,6,main]
org.apache.lucene.index.MergePolicy$MergeException: java.lang.IllegalStateException: failed
        at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545)
        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
Caused by: java.lang.IllegalStateException: failed
        at org.apache.lucene.util.packed.DirectPackedReader.get(DirectPackedReader.java:93)
        at org.apache.lucene.util.packed.BlockPackedReader.get(BlockPackedReader.java:86)
        at org.apache.lucene.util.LongValues.get(LongValues.java:35)
        at org.apache.lucene.codecs.lucene45.Lucene45DocValuesProducer$5.getOrd(Lucene45DocValuesProducer.java:459)
        at org.apache.lucene.codecs.DocValuesConsumer$4$1.setNext(DocValuesConsumer.java:389)
        at org.apache.lucene.codecs.DocValuesConsumer$4$1.hasNext(DocValuesConsumer.java:352)
        at org.apache.lucene.codecs.lucene45.Lucene45DocValuesConsumer.addNumericField(Lucene45DocValuesConsumer.java:141)
        at org.apache.lucene.codecs.lucene45.Lucene45DocValuesConsumer.addSortedField(Lucene45DocValuesConsumer.java:350)
        at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addSortedField(PerFieldDocValuesFormat.java:116)
        at org.apache.lucene.codecs.DocValuesConsumer.mergeSortedField(DocValuesConsumer.java:305)
        at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:197)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:116)
        at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4058)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3655)
        at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
Caused by: java.io.EOFException: Read past EOF (resource: BBIndexInput(name=_67nex_Lucene45_0.dvd))
        at com.datastax.bdp.search.lucene.store.bytebuffer.ByteBufferIndexInput.switchCurrentBuffer(ByteBufferIndexInput.java:188)
        at com.datastax.bdp.search.lucene.store.bytebuffer.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:129)
        at org.apache.lucene.util.packed.DirectPackedReader.get(DirectPackedReader.java:64)
        ... 15 more
例外情况2(与本文顶部的原始例外情况完全相同):

更新第二部分(2014-08-14):

重新加载警告示例:

 WARN [http-8983-2] 2014-08-14 08:31:28,828 CassandraCoreContainer.java (line 739) Too much waiting for new searcher...
 WARN [http-8983-2] 2014-08-14 08:31:28,831 SolrCores.java (line 375) Tried to remove core myks.mycf from pendingCoreOps and it wasn't there.
 INFO [http-8983-2] 2014-08-14 08:31:28,832 StorageService.java (line 2644) Starting repair command #3, repairing 0 ranges for keyspace solr_admin
 INFO [http-8983-2] 2014-08-14 08:31:28,835 SolrDispatchFilter.java (line 672) [admin] webapp=null path=/admin/cores params={slave=true&deleteAll=false&name=myks.mycf&distributed=false&action=RELOAD&reindex=false&core=myks.mycf&wt=javabin&version=2} status=0 QTime=61640
更新(2014-08-23):


在重新执行建议的解决方法后,我无法再复制该异常

为所有核心更新solrconfig.xml:将
目录工厂
com.datastax.bdp.cassandra.index.solr.DSENRTCachingDirectoryFactory
交换到
solr.MMapDirectoryFactory


另外,使用的是什么操作系统、JVM版本、多少CPU、堆大小、总可用内存。在加载过程中会出现多少分钟/小时的错误。

CentOS 6.5 x64、JDK 1.7.055、hexacore、堆大小120gb(我们有特定的查询要求),128gb总ram。错误显示为randomlyOk谢谢。正在解决此问题。解决方法是使用“solr.MMapDirectoryFactory”。您能否简要解释一下发生的情况?即,原因是什么?这是否特定于DSE 4.5?是否需要重新编制索引?(因为我已经看到nodetool cfstats中的“键数”与Solr admin UI中的“numDocs”之间存在很大差异)另外,目录工厂是否可以动态地从“com.datastax.bdp.cassandra.index.Solr.DSENRTCachingDirectoryFactory”更改为“Solr.MMapDirectoryFactory”正如您所建议的那样?我们的群集中已经有数据了-我们不想重新开始这个问题似乎不是特定于特定版本的DSE。由于Lucene段只写入一次,然后合并,因此不需要重新编制索引。错误发生在合并过程中,这意味着原始索引仍然存在。但是,如果ata丢失,则可能需要重新编制索引。
 WARN [http-8983-2] 2014-08-14 08:31:28,828 CassandraCoreContainer.java (line 739) Too much waiting for new searcher...
 WARN [http-8983-2] 2014-08-14 08:31:28,831 SolrCores.java (line 375) Tried to remove core myks.mycf from pendingCoreOps and it wasn't there.
 INFO [http-8983-2] 2014-08-14 08:31:28,832 StorageService.java (line 2644) Starting repair command #3, repairing 0 ranges for keyspace solr_admin
 INFO [http-8983-2] 2014-08-14 08:31:28,835 SolrDispatchFilter.java (line 672) [admin] webapp=null path=/admin/cores params={slave=true&deleteAll=false&name=myks.mycf&distributed=false&action=RELOAD&reindex=false&core=myks.mycf&wt=javabin&version=2} status=0 QTime=61640