为什么Hibernate MassIndexer说索引已经完成,但实际上并没有';未完成

为什么Hibernate MassIndexer说索引已经完成,但实际上并没有';未完成,hibernate,hibernate-search,Hibernate,Hibernate Search,我正在尝试使用MassIndexer在弹性搜索中索引大数据(1350万条与7-8个表相关的记录)。它显示了一条消息,它索引了39.08%之后的所有记录。我在本地和生产中遇到了相同的问题,每次执行的百分比都不同 fullTextEntityManager .createIndexer(XYZ.class) .batchSizeToLoadObjects(500).cacheMode(CacheMode.IGNORE).threadsT

我正在尝试使用MassIndexer在弹性搜索中索引大数据(1350万条与7-8个表相关的记录)。它显示了一条消息,它索引了39.08%之后的所有记录。我在本地和生产中遇到了相同的问题,每次执行的百分比都不同

fullTextEntityManager
                .createIndexer(XYZ.class)
                .batchSizeToLoadObjects(500).cacheMode(CacheMode.IGNORE).threadsToLoadObjects(2).idFetchSize(Integer.MIN_VALUE)
                .startAndWait();
日志:

它应该只在索引所有记录之后才显示索引已完成。

这看起来很像,它在6.0.0.Alpha2中已修复,但没有向后移植到5.11

长话短说:这是一个日志问题,而不是索引问题。最后一行表示所有内容都已重新编制索引,这是您应该相信的

我将看看我们是否可以轻松地将补丁向后移植到5.10/5.11,但在我们再次发布这些分支之前可能需要一些时间。Backport ticket(如果您需要跟踪进度):

清楚地显示在批量索引过程中存在错误,这在您最初的帖子中没有提到

您会定期收到如下错误:

10:48:28,125 (Hibernate Search: Elasticsearch transport thread-2) ERROR LogErrorHandler:71 - HSEARCH000058: Exception occurred org.hibernate.search.exception.SearchException: HSEARCH400007: Elasticsearch request failed.
Request: POST /_bulk with parameters {refresh=false}
Response: null
Subsequent failures:
    Entity com.example.model.XXXXXX  Id 855665929073643520  Work Type  org.hibernate.search.backend.AddLuceneWork

org.hibernate.search.exception.SearchException: HSEARCH400007: Elasticsearch request failed.
Request: POST /_bulk with parameters {refresh=false}
Response: null
    at org.hibernate.search.elasticsearch.work.impl.BulkWork.lambda$execute$1(BulkWork.java:77)
    at org.hibernate.search.util.impl.Futures.lambda$handler$1(Futures.java:57)
    at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
    at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
    at org.hibernate.search.elasticsearch.client.impl.DefaultElasticsearchClient$1.onFailure(DefaultElasticsearchClient.java:123)
    at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onDefinitiveFailure(RestClient.java:605)
    at org.elasticsearch.client.RestClient$1.retryIfPossible(RestClient.java:396)
    at org.elasticsearch.client.RestClient$1.failed(RestClient.java:375)
    at org.apache.http.concurrent.BasicFuture.failed(BasicFuture.java:134)
    at org.apache.http.impl.nio.client.AbstractClientExchangeHandler.failed(AbstractClientExchangeHandler.java:419)
    at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:375)
    at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92)
    at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39)
    at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:263)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:492)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:213)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException
    ... 11 more
而本质上意味着由于Elasticsearch花费太长时间来回答,一些索引请求失败

可能有很多原因

您的Hibernate搜索配置看起来非常保守(只有两个线程),所以我认为您没有对Elasticsearch集群施加太多压力

我建议再次检查您的Elasticsearch设置(Elasticsearch文档可能提供了一些有帮助的注意事项)。 检查您是否有一个大小合适的Elasticsearch集群,服务器的大小是否合适


您可能还需要调整与Elasticsearch群集通信相关的
hibernate.search配置属性:超时、连接数。。。请参见

谢谢您的回复。我使用/index/\u count检查了elasticsearch文档计数,它与数据库记录不匹配。在日志中,39%与上一条语句之间只有12分钟的间隔,在此期间不可能索引800万条记录。在日志中,只有12秒的间隔。我在前面的评论中提到了as mins。那么很可能在索引过程中出现了一些失败。通常情况下,会将异常通知给ErrorHandler,默认情况下,ErrorHandler将在
ERROR
级别记录异常。你看到这种东西了吗?如果没有,请检查您的自定义错误处理程序(如果有),并检查logger
org.hibernate.search.exception.impl.LogErrorHandler
是否在log4j配置或等效配置中的错误级别启用。控制台上没有错误。我将hibernate日志设置为调试级别。log4j.logger.org.hibernate=debug log4j.logger.org.hibernate.search=debug此时,我能看到这种情况发生的唯一方法是抛出一个
错误
:在索引实体时,抛出了一个
错误
(不是一个
异常
,一个
错误
),没有捕获
错误
,并最终杀死了线。您也看过标准输出/标准错误吗?根据您的配置,标准输出/标准错误可能会被重定向到与日志不同的文件(我知道这可能会发生在Tomcat上,特别是在其他平台上)。那么您可能会错过一个
OutOfMemoryError
或类似的被转储到标准输出的错误?
10:48:28,125 (Hibernate Search: Elasticsearch transport thread-2) ERROR LogErrorHandler:71 - HSEARCH000058: Exception occurred org.hibernate.search.exception.SearchException: HSEARCH400007: Elasticsearch request failed.
Request: POST /_bulk with parameters {refresh=false}
Response: null
Subsequent failures:
    Entity com.example.model.XXXXXX  Id 855665929073643520  Work Type  org.hibernate.search.backend.AddLuceneWork

org.hibernate.search.exception.SearchException: HSEARCH400007: Elasticsearch request failed.
Request: POST /_bulk with parameters {refresh=false}
Response: null
    at org.hibernate.search.elasticsearch.work.impl.BulkWork.lambda$execute$1(BulkWork.java:77)
    at org.hibernate.search.util.impl.Futures.lambda$handler$1(Futures.java:57)
    at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
    at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
    at org.hibernate.search.elasticsearch.client.impl.DefaultElasticsearchClient$1.onFailure(DefaultElasticsearchClient.java:123)
    at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onDefinitiveFailure(RestClient.java:605)
    at org.elasticsearch.client.RestClient$1.retryIfPossible(RestClient.java:396)
    at org.elasticsearch.client.RestClient$1.failed(RestClient.java:375)
    at org.apache.http.concurrent.BasicFuture.failed(BasicFuture.java:134)
    at org.apache.http.impl.nio.client.AbstractClientExchangeHandler.failed(AbstractClientExchangeHandler.java:419)
    at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:375)
    at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92)
    at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39)
    at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:263)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:492)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:213)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException
    ... 11 more