elasticsearch,apache-httpasyncclient,Java,Rest,elasticsearch,Apache Httpasyncclient" /> elasticsearch,apache-httpasyncclient,Java,Rest,elasticsearch,Apache Httpasyncclient" />

通过apache http nio使用elasticsearch REST Java客户端时发生OutOfMemoryError

通过apache http nio使用elasticsearch REST Java客户端时发生OutOfMemoryError,java,rest,elasticsearch,apache-httpasyncclient,Java,Rest,elasticsearch,Apache Httpasyncclient,我们使用elasticsearch REST Java客户端(我们使用Java 7,因此无法使用普通的elasticsearch Java客户端)与我们的elasticsearch服务器交互。除了我们尝试对大约130万个文档进行初始索引外,这一切都很好。这运行了一段时间,但在几十万个文档之后,我们得到了一个 20/06 21:27:33,153 ERROR [cid=51][stderr][write:71] (pool-837116-thread-1) Exception in thread

我们使用elasticsearch REST Java客户端(我们使用Java 7,因此无法使用普通的elasticsearch Java客户端)与我们的elasticsearch服务器交互。除了我们尝试对大约130万个文档进行初始索引外,这一切都很好。这运行了一段时间,但在几十万个文档之后,我们得到了一个

20/06 21:27:33,153 ERROR [cid=51][stderr][write:71] (pool-837116-thread-1) Exception in thread "pool-837116-thread-1" java.lang.OutOfMemoryError: unable to create new native thread
20/06 21:27:33,154 ERROR [cid=51][stderr][write:71] (pool-837116-thread-1)  at java.lang.Thread.start0(Native Method)
20/06 21:27:33,154 ERROR [cid=51][stderr][write:71] (pool-837116-thread-1)  at java.lang.Thread.start(Thread.java:693)
20/06 21:27:33,154 ERROR [cid=51][stderr][write:71] (pool-837116-thread-1)  at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:334)
20/06 21:27:33,154 ERROR [cid=51][stderr][write:71] (pool-837116-thread-1)  at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:194)
20/06 21:27:33,154 ERROR [cid=51][stderr][write:71] (pool-837116-thread-1)  at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
20/06 21:27:33,155 ERROR [cid=51][stderr][write:71] (pool-837116-thread-1)  at java.lang.Thread.run(Thread.java:724)

java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
    at org.apache.http.util.Asserts.check(Asserts.java:46)
    at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
    at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
    at org.elasticsearch.client.RestClient.performRequestAsync(RestClient.java:343)
    at org.elasticsearch.client.RestClient.performRequestAsync(RestClient.java:325)
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:218)
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:191)
如您所见,Elasticsearch REST客户端正在使用ApacheHTTPNIO。我发现奇怪的是,nio库正在为每个请求(或连接?)创建一个线程。从上面的日志中可以看到线程(pool-837116-thread-1)。还有许多I/O调度程序线程的数量在不断增加

不过,活动线程的总数似乎变化不大。 因此,似乎不是重用线程,而是为每个连接周期创建一个(实际上是两个)新线程。 上传内容基本上是:

1。创建客户端

    restClient = RestClient.builder(new HttpHost(host.getHost(),host.getPort(),host.getProtocol())/*,new HttpHost(host.getHost(),host.getPort()+1,host.getProtocol())*/)
                            .setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
                                @Override
                                public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) {
                                    return httpClientBuilder
                                            .setDefaultCredentialsProvider(credsProvider)
                                                                                }
                            }).setMaxRetryTimeoutMillis(30000).build();
        try{
            HttpEntity entity = new NStringEntity(json,ContentType.APPLICATION_JSON);
            Response indexResponse = restClient.performRequest("PUT", endpoint, parameters,entity,header);
            log.debug("Response #0 #1", indexResponse,indexResponse.getStatusLine());
            log.debug("Entity #0",indexResponse.getEntity());

        }finally{
            if(restClient!=null){
                log.debug("Closing restClient #0", restClient);
                restClient.close();
            }
        }
2。使用json正文发送请求并关闭客户端

    restClient = RestClient.builder(new HttpHost(host.getHost(),host.getPort(),host.getProtocol())/*,new HttpHost(host.getHost(),host.getPort()+1,host.getProtocol())*/)
                            .setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
                                @Override
                                public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) {
                                    return httpClientBuilder
                                            .setDefaultCredentialsProvider(credsProvider)
                                                                                }
                            }).setMaxRetryTimeoutMillis(30000).build();
        try{
            HttpEntity entity = new NStringEntity(json,ContentType.APPLICATION_JSON);
            Response indexResponse = restClient.performRequest("PUT", endpoint, parameters,entity,header);
            log.debug("Response #0 #1", indexResponse,indexResponse.getStatusLine());
            log.debug("Entity #0",indexResponse.getEntity());

        }finally{
            if(restClient!=null){
                log.debug("Closing restClient #0", restClient);
                restClient.close();
            }
        }
这正常吗?为什么ApacheNIO不重用线程?这是elasticsearch REST客户端、ApacheNIO还是我的代码的问题?我给restClient打了个电话,不知道我还能做什么

我已尝试在IO Reactor上将线程数设置为仅为1:

restClient = RestClient.builder(new HttpHost(host.getHost(),host.getPort(),host.getProtocol())/*,new HttpHost(host.getHost(),host.getPort()+1,host.getProtocol())*/)
                            .setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
                                @Override
                                public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) {
                                    return httpClientBuilder
                                            .setDefaultCredentialsProvider(credsProvider)
                                            .setDefaultIOReactorConfig(IOReactorConfig.custom().setIoThreadCount(1).build()); //set to one thread
                                }
                            }).setMaxRetryTimeoutMillis(30000).build();

但是这并没有改变任何关于线程重用的问题。

我找到了OutOfMemory错误的原因。虽然我使用了一个try-finally块来关闭客户端,但是在该块之外抛出了一个异常(该块并没有覆盖所有的D'oh)。但是创建这么多线程看起来仍然是错误的(尽管总线程数没有显著增加)。

您使用的是批量插入还是逐个插入?我已经找到了OutOfMemoryError的原因。虽然我使用了一个try-finally块来关闭客户端,但是在该块之外抛出了一个异常(该块并没有覆盖所有的D'oh)。但是创建这么多线程看起来仍然是错误的(尽管总体线程的数量没有显著增加)。这是一个逐个插入,因为我需要确保每个线程的数据都已上载。它使用与普通索引相同的机制。批量插入也会在批处理中的单个记录失败时通知您,因此这可能是一个选项。批量api速度更快,资源密集度更低(即使是非常小的批处理),一些JSON数据中有换行符(我无法删除)。批量插入仍然有效吗?我已经读到,您不应该漂亮地打印json,因为它使用换行符来分隔命令。在引号中跳过换行符是否足够聪明?