Java cassandra datastax驱动程序引发的写入超时

Java cassandra datastax驱动程序引发的写入超时,java,cassandra,datastax,Java,Cassandra,Datastax,在执行大容量数据加载、基于日志数据增加计数器时,我遇到了超时异常。Im使用Datastax 2.0-rc2 java驱动程序 这是服务器无法跟上的问题(即服务器端配置问题),还是客户端在等待服务器响应时感到厌烦的问题?无论哪种方式,是否有一个简单的配置更改可以解决这个问题 Exception in thread "main" com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during wr

在执行大容量数据加载、基于日志数据增加计数器时,我遇到了超时异常。Im使用Datastax 2.0-rc2 java驱动程序

这是服务器无法跟上的问题(即服务器端配置问题),还是客户端在等待服务器响应时感到厌烦的问题?无论哪种方式,是否有一个简单的配置更改可以解决这个问题

Exception in thread "main" com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)
    at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54)
    at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:271)
    at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:187)
    at com.datastax.driver.core.Session.execute(Session.java:126)
    at jason.Stats.analyseLogMessages(Stats.java:91)
    at jason.Stats.main(Stats.java:48)
Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)
    at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54)
    at com.datastax.driver.core.Responses$Error.asException(Responses.java:92)
    at com.datastax.driver.core.ResultSetFuture$ResponseCallback.onSet(ResultSetFuture.java:122)
    at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:224)
    at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:373)
    at com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:510)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)
    at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:53)
    at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:33)
    at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:165)
    at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66)
    ... 21 more
其中一个节点大致在发生时报告此情况:

ERROR [Native-Transport-Requests:12539] 2014-02-16 23:37:22,191 ErrorMessage.java (line 222) Unexpected exception during request
java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(Unknown Source)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
    at sun.nio.ch.IOUtil.read(Unknown Source)
    at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

虽然我不理解这个问题的根本原因,但我还是能够通过增加conf/cassandra.yaml文件中的超时值来解决这个问题

write_request_timeout_in_ms: 20000

它是协调器(因此服务器)超时等待写入确认。

在连接了SAN存储的ESX群集中的单个节点上,我们遇到了类似的问题(确实如此,但目前没有其他选项)

注意:以下设置可能会对Cassandra所能达到的最高性能造成重大打击,但我们选择了稳定的系统而不是高性能

运行
iostat-xmt1
时,我们发现在发生WriteTimeOutException的同时,等待时间很长。事实证明,memtable无法在默认的
write\u request\u timeout\u in\u ms:2000
设置内写入磁盘

我们将memtable大小从512Mb(默认值为堆空间的25%,在我们的例子中为2Gb)显著减少到32Mb:

我们还将写入超时稍微增加到3秒:

write_request_timeout_in_ms: 3000
如果IO等待时间较长,请确保定期写入磁盘:

#commitlog_sync: batch
#commitlog_sync_batch_window_in_ms: 2
#
# the other option is "periodic" where writes may be acked immediately
# and the CommitLog is simply synced every commitlog_sync_period_in_ms
# milliseconds.
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000

这些设置允许memtable保持较小,并且可以经常写入。异常得到了解决,我们通过了系统上运行的压力测试。

再次检查您的Cassandra GC设置是值得的

在我的例子中,我使用信号量来限制异步写入,但仍然(有时)超时

后来发现我使用了不合适的GC设置,为了方便起见,我使用了cassandra单元,这无意中造成了使用默认VM设置运行的后果。因此,我们最终会触发停止world GC,导致写入超时。应用与我运行的cassandra docker映像相同的GC设置,一切都很好


这可能是一个不常见的原因,但它会帮助我,因此它似乎值得在这里录制。

我曾经遇到过同样的问题。我使用
BatchStatement
在Cassnadra中编写数据。我的批量是10000。在减少了批量之后,我没有遇到异常。所以,也许您正试图在一个请求中将大量数据加载到Cassandra中。这实际上是一个非常糟糕的选择。你有没有发现为什么会发生这种情况,因为我现在面临着同样的错误。@Superbrain\u感谢分享你对这种解决方法的判断。我相信有些人会觉得你的判断很有趣。如果您找到了这个问题的替代解决方案,我相信每个人都想知道。其中一个原因可能是cassandra正在运行一些内存密集型内部进程,如压缩、修复等,而您只是没有足够的内存来在2秒内进行写操作—这在开发过程中经常发生在我身上。它可以正常工作10-15分钟,然后出现这个错误,所以我必须重新启动它。非常烦人。嗨,克里斯,我如何进一步调试以找出ACK没有出现的原因?我面临着一个类似的问题,并试图找到根本原因。。。谢谢
#commitlog_sync: batch
#commitlog_sync_batch_window_in_ms: 2
#
# the other option is "periodic" where writes may be acked immediately
# and the CommitLog is simply synced every commitlog_sync_period_in_ms
# milliseconds.
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000