Apache storm 添加3个节点时Storm supervisor未启动

Apache storm 添加3个节点时Storm supervisor未启动,apache-storm,trident,Apache Storm,Trident,我正在尝试在多节点Storm集群上测试Storm+Kafka+Trident作业 当我在机器1中运行作业时,作业将运行并处理记录 当我在添加第二个工人后运行作业时,该作业也会毫无问题地运行 当我向集群添加第三个worker时,问题就开始了。我在工人日志中得到以下信息 2014-07-16 16:47:56 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-cassandra1/10.201.221.139:6701... [29]

我正在尝试在多节点Storm集群上测试Storm+Kafka+Trident作业

当我在机器1中运行作业时,作业将运行并处理记录 当我在添加第二个工人后运行作业时,该作业也会毫无问题地运行

当我向集群添加第三个worker时,问题就开始了。我在工人日志中得到以下信息

2014-07-16 16:47:56 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-cassandra1/10.201.221.139:6701... [29]
2014-07-16 16:47:56 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-cassandra1/10.201.221.139:6703... [30]
2014-07-16 16:47:57 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-cassandra1/10.201.221.139:6702... [30]
2014-07-16 16:47:57 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-cassandra1/10.201.221.139:6700... [29]
2014-07-16 16:47:57 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-cassandra1/10.201.221.139:6701... [30]
2014-07-16 16:47:57 b.s.m.n.Client [INFO] Closing Netty Client Netty-Client-cassandra1/10.201.221.139:6703
2014-07-16 16:47:57 b.s.m.n.Client [INFO] Waiting for pending batchs to be sent with Netty-Client-cassandra1/10.201.221.139:6703..., timeout: 600000ms, pendings: 0
2014-07-16 16:47:58 b.s.m.n.Client [INFO] Closing Netty Client Netty-Client-cassandra1/10.201.221.139:6702
2014-07-16 16:47:58 b.s.m.n.Client [INFO] Waiting for pending batchs to be sent with Netty-Client-cassandra1/10.201.221.139:6702..., timeout: 600000ms, pendings: 0
2014-07-16 16:47:58 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-cassandra1/10.201.221.139:6700... [30]
2014-07-16 16:48:31 s.k.KafkaUtils [INFO] Metrics Tick: Not enough data to calculate spout lag.
2014-07-16 16:48:34 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-172.144.96.66.static.eigbox.net/66.96.144.172:6701... [6]
2014-07-16 16:48:34 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-172.144.96.66.static.eigbox.net/66.96.144.172:6703... [6]
在主管日志中,我收到以下消息

2014-07-16 16:47:26 b.s.d.supervisor [INFO] 1fdb9a02-1110-458c-b72e-91950fbbc5fd still hasn't started
2014-07-16 16:47:27 b.s.d.supervisor [INFO] 1fdb9a02-1110-458c-b72e-91950fbbc5fd still hasn't started
2014-07-16 16:47:27 b.s.d.supervisor [INFO] 1fdb9a02-1110-458c-b72e-91950fbbc5fd still hasn't started
2014-07-16 16:47:28 b.s.d.supervisor [INFO] 1fdb9a02-1110-458c-b72e-91950fbbc5fd still hasn't started
2014-07-16 16:47:28 b.s.d.supervisor [INFO] 1fdb9a02-1110-458c-b72e-91950fbbc5fd still hasn't started
2014-07-16 16:47:29 b.s.d.supervisor [INFO] 1fdb9a02-1110-458c-b72e-91950fbbc5fd still hasn't started
2014-07-16 16:47:29 b.s.d.supervisor [INFO] 1fdb9a02-1110-458c-b72e-91950fbbc5fd still hasn't started
2014-07-16 16:47:30 b.s.d.supervisor [INFO] 1fdb9a02-1110-458c-b72e-91950fbbc5fd still hasn't started
作业根本不运行。我的storm.yaml配置如下

storm.zookeeper.servers:
- "10.201.32.79"
# 
nimbus.host: "10.201.32.79"
storm.local.dir: "/home/hadoop/stormtmp"
java.library.path: "/opt/java7/lib"
#supervisor.slots.ports:
#    - 6700
#    - 6701
#    - 6702
#    - 6703
worker.childopts: "-Xmx2048m -XX:NewSize=1000m -XX:MaxNewSize=1000m"
nimbus.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true"
supervisor.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true"
ui.port: 8084
ui.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true"

这基本上是说主管无法启动工人。。请尝试查看主管日志中的内容,如
b.s.d.supervisor[INFO]使用命令启动worker:java-server….


现在,复制此命令并尝试在您的主管上运行它,看看您是否遇到任何错误,如果遇到,您可能需要相应地配置storm.yaml

是否有人可以帮助您?我可以使用storm版本0.9.2解决此问题。风暴0.9.1有一个已知的错误,使得它像这样出现故障JIRA-187已在风暴0.9.2中得到解决。另外,我将净最小等待毫秒增加到4000ms,最大等待毫秒增加到10000ms。这似乎成功了。无论如何,谢谢storm.messaging.netty.max\u retries=100 storm.messaging.netty.max\u wait\u ms=1200000这为我解决了这个问题。Netty对超时非常敏感,如果处理不当,将导致工人崩溃,主管将重新启动他们。