OptionalDataException导致StormServerHandler Netty错误后,Storm拓扑停止发射

OptionalDataException导致StormServerHandler Netty错误后,Storm拓扑停止发射,netty,apache-storm,kryo,optionaldataexception,Netty,Apache Storm,Kryo,Optionaldataexception,我们有一个storm群集,运行时有3个节点和多个拓扑。我们使用apache-storm-1.2.2和java1.8.0\u162 目前我们遇到的问题是,在发生错误且Netty服务器不可用后,随机拓扑会在随机时间停止发射。这可能在几个小时或几天后发生 因为我们没有改变风暴螺栓发出或执行数据的逻辑,所以我们目前不知道如何抛出这样的错误。还有一个问题是,为什么整个拓扑在出现这样的错误后停止工作 似乎某些HashMap的反序列化有问题。但我们不知道这是怎么发生的 以下是导致故障的一名工人的错误: 201

我们有一个storm群集,运行时有3个节点和多个拓扑。我们使用
apache-storm-1.2.2
java1.8.0\u162

目前我们遇到的问题是,在发生错误且Netty服务器不可用后,随机拓扑会在随机时间停止发射。这可能在几个小时或几天后发生

因为我们没有改变风暴螺栓发出或执行数据的逻辑,所以我们目前不知道如何抛出这样的错误。还有一个问题是,为什么整个拓扑在出现这样的错误后停止工作

似乎某些HashMap的反序列化有问题。但我们不知道这是怎么发生的

以下是导致故障的一名工人的错误:

2019-09-24 14:31:02.414 o.a.s.m.n.StormServerHandler Netty-server-localhost-6727-worker-2 [ERROR] server errors in handling the request
java.lang.RuntimeException: java.io.OptionalDataException
        at org.apache.storm.serialization.SerializableSerializer.read(SerializableSerializer.java:58) ~[storm-core-1.2.2.jar:1.2.2]
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793) ~[kryo-3.0.3.jar:?]
        at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134) ~[kryo-3.0.3.jar:?]
        at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40) ~[kryo-3.0.3.jar:?]
        at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:689) ~[kryo-3.0.3.jar:?]
        at org.apache.storm.serialization.KryoValuesDeserializer.deserializeFrom(KryoValuesDeserializer.java:37) ~[storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.serialization.KryoTupleDeserializer.deserialize(KryoTupleDeserializer.java:50) ~[storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.messaging.DeserializingConnectionCallback.recv(DeserializingConnectionCallback.java:56) ~[storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.messaging.netty.Server.enqueue(Server.java:134) ~[storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.messaging.netty.Server.received(Server.java:255) ~[storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.messaging.netty.StormServerHandler.messageReceived(StormServerHandler.java:61) ~[storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) ~[storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [storm-core-1.2.2.jar:1.2.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
Caused by: java.io.OptionalDataException
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1587) ~[?:1.8.0_162]
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:427) ~[?:1.8.0_162]
        at java.util.HashMap.readObject(HashMap.java:1407) ~[?:1.8.0_162]
        at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) ~[?:?]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_162]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_162]
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158) ~[?:1.8.0_162]
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) ~[?:1.8.0_162]
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060) ~[?:1.8.0_162]
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567) ~[?:1.8.0_162]
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278) ~[?:1.8.0_162]
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202) ~[?:1.8.0_162]
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060) ~[?:1.8.0_162]
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567) ~[?:1.8.0_162]
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:427) ~[?:1.8.0_162]
        at org.apache.storm.serialization.SerializableSerializer.read(SerializableSerializer.java:56) ~[storm-core-1.2.2.jar:1.2.2]
        ... 32 more
抛出此错误后,此特定主题的所有其他工作人员将停止工作,我们必须终止并重新部署

其他工人的日志要么没有显示任何有用信息,要么显示:

2019-09-24 14:31:03.012 o.a.s.m.n.Client Thread-32-disruptor-worker-transfer-queue [ERROR] connection to Netty-Client-HOSTE_NAME/IP:PORT is unavailable
2019-09-24 14:31:03.053 o.a.s.m.n.Client client-worker-1 [WARN] Re-connection to HOSTE_NAME/IP:PORT was successful but 4 messages has been lost so far

根据您发布的内容,以下几条注释可能有助于您缩小问题范围:

关于拓扑空闲,您应该尝试升级到至少Storm 1.2.3。你可能会受到影响

堆栈跟踪显示您正在对某些元组使用Java序列化。我认为这里发生的事情是,您的一个工作者序列化了一个元组,该元组被发送给另一个工作者,然后该元组的反序列化失败。看

首先,我认为您应该尝试关闭Java序列化。它很慢,而且您可能不希望它出现在生产拓扑中。看

其次,下一次拓扑挂起时(假设升级到1.2.3无法修复),我将尝试使用
jstack
为空闲工作线程获取线程转储。这应该告诉您拓扑没有做任何事情的原因


最后,您可以尝试暂时将字节数组记录在,当您看到异常时,它们可能会告诉您出了什么问题。您得到的异常表明字节数组不包含序列化哈希映射的预期数据。要添加此日志,您需要编辑和构建Storm,请参阅以了解如何执行此操作

问题是,在我们的生产环境中,我们不允许只安装新软件。因此,像jstack或building Storm这样的工具不是一个选项。关于你发布的bug,我认为我们没有受到影响。我们的工人在故障后成功地重新连接。他们只是不再开始工作。正如您所提到的,我们认为当工作人员相互通信时,问题在于序列化和反序列化之间。我们只是不明白在哪种情况下会发生这种情况。我们基本上只发出一个对象,其中包含一些HashMaps。在什么情况下,工作人员相互通信并序列化/反序列化元组?当一个工作人员中有一个螺栓或喷嘴需要将元组发送给另一个工作人员时。设置此选项可能会对您有所帮助。这迫使Storm序列化元组,即使在工作程序内部的螺栓之间发送元组也是如此。