Apache storm Storm拓扑不会以并行提示1200开始_Apache Storm_Apache Storm Topology

Apache storm Storm拓扑不会以并行提示1200开始

apache-storm

Apache storm Storm拓扑不会以并行提示1200开始,apache-storm,apache-storm-topology,Apache Storm,Apache Storm Topology,我有一个带有3个螺栓（a、B、C）的storm拓扑，其中中间螺栓的平均时间约为450ms，其他两个螺栓的平均时间不到1ms 我能够使用以下并行性提示值运行拓扑： Version Info: "org.apache.storm" % "storm-core" % "1.2.1" "org.apache.storm" % "storm-kafka-client" % "1.2.1" 但当我将并行性提示B增加到1200时，拓扑不会启动在拓扑日志中，我看到多次加载executor:

我有一个带有3个螺栓（a、B、C）的storm拓扑，其中中间螺栓的平均时间约为450ms，其他两个螺栓的平均时间不到1ms

我能够使用以下并行性提示值运行拓扑：

Version Info: 
   "org.apache.storm" % "storm-core" % "1.2.1" 
   "org.apache.storm" % "storm-kafka-client" % "1.2.1"

但当我将并行性提示B增加到1200时，拓扑不会启动

在拓扑日志中，我看到多次加载executor:B的日志，如下所示：

A: 4 
B: 700
C: 10

但在这段时间内，工作进程将重新启动。我在拓扑日志或风暴日志中没有看到任何错误。以下是工人重新启动时的风暴日志：

2018-05-18 18:56:37.462 o.a.s.d.executor main [INFO] Loading executor B:[111 111]
2018-05-18 18:56:37.463 o.a.s.d.executor main [INFO] Loaded executor tasks B:[111 111]
2018-05-18 18:56:37.465 o.a.s.d.executor main [INFO] Finished loading executor B:[111 111]
2018-05-18 18:56:37.528 o.a.s.d.executor main [INFO] Loading executor B:[355 355]
2018-05-18 18:56:37.529 o.a.s.d.executor main [INFO] Loaded executor tasks B:[355 355]
2018-05-18 18:56:37.530 o.a.s.d.executor main [INFO] Finished loading executor B:[355 355]
2018-05-18 18:56:37.666 o.a.s.d.executor main [INFO] Loading executor B:[993 993]
2018-05-18 18:56:37.667 o.a.s.d.executor main [INFO] Loaded executor tasks B:[993 993]
2018-05-18 18:56:37.669 o.a.s.d.executor main [INFO] Finished loading executor B:[993 993]
2018-05-18 18:56:37.713 o.a.s.d.executor main [INFO] Loading executor B:[765 765]
2018-05-18 18:56:37.714 o.a.s.d.executor main [INFO] Loaded executor tasks B:[765 765]

这种情况不断发生，拓扑永远不会重新启动，当bolt:B的并行性提示为700时，拓扑会完美启动，并没有其他更改

我在这里看到一个有趣的日志，但还不确定这意味着什么：

工作进程766258fe-a604-4385-8eeb-e85cad38b674已退出，代码为143

有什么建议吗

编辑：

配置：

2018-05-18 18:51:46.755 o.a.s.d.s.Container SLOT_6700 [INFO] Killing eaf4d8ce-e758-4912-a15d-6dab8cda96d0:766258fe-a604-4385-8eeb-e85cad38b674
2018-05-18 18:51:47.204 o.a.s.d.s.BasicContainer Thread-7 [INFO] Worker Process 766258fe-a604-4385-8eeb-e85cad38b674 exited with code: 143
2018-05-18 18:51:47.766 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE RUNNING msInState: 109081 topo:myTopology-1-1526649581 worker:766258fe-a604-4385-8eeb-e85cad38b674 -> KILL msInState: 0 topo:myTopology-1-1526649581 worker:766258fe-a604-4385-8eeb-e85cad38b674
2018-05-18 18:51:47.766 o.a.s.d.s.Container SLOT_6700 [INFO] GET worker-user for 766258fe-a604-4385-8eeb-e85cad38b674
2018-05-18 18:51:47.774 o.a.s.d.s.Slot SLOT_6700 [WARN] SLOT 6700 all processes are dead...
2018-05-18 18:51:47.775 o.a.s.d.s.Container SLOT_6700 [INFO] Cleaning up eaf4d8ce-e758-4912-a15d-6dab8cda96d0:766258fe-a604-4385-8eeb-e85cad38b674
2018-05-18 18:51:47.775 o.a.s.d.s.Container SLOT_6700 [INFO] GET worker-user for 766258fe-a604-4385-8eeb-e85cad38b674
2018-05-18 18:51:47.775 o.a.s.d.s.AdvancedFSOps SLOT_6700 [INFO] Deleting path /home/saurabh/storm-run/workers/766258fe-a604-4385-8eeb-e85cad38b674/pids/27798
2018-05-18 18:51:47.775 o.a.s.d.s.AdvancedFSOps SLOT_6700 [INFO] Deleting path /home/saurabh/storm-run/workers/766258fe-a604-4385-8eeb-e85cad38b674/heartbeats
2018-05-18 18:51:47.780 o.a.s.d.s.AdvancedFSOps SLOT_6700 [INFO] Deleting path /home/saurabh/storm-run/workers/766258fe-a604-4385-8eeb-e85cad38b674/pids
2018-05-18 18:51:47.780 o.a.s.d.s.AdvancedFSOps SLOT_6700 [INFO] Deleting path /home/saurabh/storm-run/workers/766258fe-a604-4385-8eeb-e85cad38b674/tmp
2018-05-18 18:51:47.781 o.a.s.d.s.AdvancedFSOps SLOT_6700 [INFO] Deleting path /home/saurabh/storm-run/workers/766258fe-a604-4385-8eeb-e85cad38b674
2018-05-18 18:51:47.782 o.a.s.d.s.Container SLOT_6700 [INFO] REMOVE worker-user 766258fe-a604-4385-8eeb-e85cad38b674
2018-05-18 18:51:47.782 o.a.s.d.s.AdvancedFSOps SLOT_6700 [INFO] Deleting path /home/saurabh/storm-run/workers-users/766258fe-a604-4385-8eeb-e85cad38b674
2018-05-18 18:51:47.783 o.a.s.d.s.BasicContainer SLOT_6700 [INFO] Removed Worker ID 766258fe-a604-4385-8eeb-e85cad38b674
2018-05-18 18:51:47.783 o.a.s.l.AsyncLocalizer SLOT_6700 [INFO] Released blob reference myTopology-1-1526649581 6700 Cleaning up BLOB references...
2018-05-18 18:51:47.784 o.a.s.l.AsyncLocalizer SLOT_6700 [INFO] Released blob reference myTopology-1-1526649581 6700 Cleaning up basic files...
2018-05-18 18:51:47.785 o.a.s.d.s.AdvancedFSOps SLOT_6700 [INFO] Deleting path /home/saurabh/storm-run/supervisor/stormdist/myTopology-1-1526649581
2018-05-18 18:51:47.808 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE KILL msInState: 42 topo:myTopology-1-1526649581 worker:null -> EMPTY msInState: 0

编辑：

中的

strace-fp PID-e trace=读取、写入、网络、信号、ipc的日志
还不能完全理解，一些相关人员从中看到：
[pid 3362]打开（“/usr/lib/locale/UTF-8/LC_CTYPE”，O_RDONLY）=-1 enoint（没有这样的文件或目录）
[pid 3362]压井（1487，SIGTERM）=0
[pid 3362]关闭（1）
Quick google建议143是JVM收到SIGTERM时的退出代码（例如）。您可能正在耗尽内存，或者操作系统可能会因为其他原因终止进程。请记住，将并行性提示设置为1200意味着您将获得螺栓B的1200个任务（副本），而之前只有700个
 我可以通过调整以下配置来运行它，似乎是由于nimbus.task.launch.sec
而超时，该值设置为120，如果在120秒内未启动，它将重新启动工作程序
其中一些配置的更新值：
topology.worker.childopts: -Xms1g -Xmx16g
topology.worker.logwriter.childopts: -Xmx1024m
topology.worker.max.heap.size.mb: 3072.0
worker.childopts: -Xms1g -Xmx16g -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=1%ID% -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -XX:+UseG1GC -XX:+AggressiveOpts -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/home/saurabh.mimani/apache-storm-1.2.1/logs/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -Dorg.newsclub.net.unix.library.path=/usr/share/specter/uds-lib/
worker.gc.childopts:
worker.heap.memory.mb: 8192
supervisor.childopts: -Xms1g -Xmx16g

关于：
任务初始启动时使用的特殊超时。在启动期间，这是在第一次心跳之前使用的超时，覆盖nimbus.task.timeout.secs。
启动时存在一个单独的超时，因为启动新的JVM并配置它们可能会有相当大的开销
我在那台机器上有很多内存，我看到显示可用，我没有得到任何内存不足的错误。我已经为worker提供了最大16gb内存，所以它应该足以产生1200个线程。我在问题中添加了一些相关的配置。在这种情况下，我可能会尝试使用strace来找出SIGTERM可能来自何处。我还使用strace日志更新了问题，仍在尝试找出它们。
drpc.request.timeout.secs: 1600
supervisor.worker.start.timeout.secs: 1200
nimbus.supervisor.timeout.secs: 1200
nimbus.task.launch.secs: 1200