Java AWS上的Vertx/Hazelcast群集成员加入群集需要10分钟,有时甚至更长

Java AWS上的Vertx/Hazelcast群集成员加入群集需要10分钟,有时甚至更长,java,amazon-web-services,hazelcast,vert.x,Java,Amazon Web Services,Hazelcast,Vert.x,当AWS在遇到重负载时向集群添加一个额外的成员时,启动需要2分钟,但加入集群需要10分钟,有时甚至更长 一些细节: Vertx 3.9.2 Hazelcast 3.12.9 Hazelcast aws 2.4 我们已经从Vertx 3.5.1(Hazelcast 3.8.2、Hazelcast aws 2.1.0)升级,但该版本没有此问题。加入集群很快。(几秒钟内) 升级时,我们没有预料到这种行为。而且,如果由于负载过重而添加了额外的服务器,这并没有多大帮助。它应该尽快启动并运行 感谢您的帮

当AWS在遇到重负载时向集群添加一个额外的成员时,启动需要2分钟,但加入集群需要10分钟,有时甚至更长

一些细节:

  • Vertx 3.9.2
  • Hazelcast 3.12.9
  • Hazelcast aws 2.4
我们已经从Vertx 3.5.1(Hazelcast 3.8.2、Hazelcast aws 2.1.0)升级,但该版本没有此问题。加入集群很快。(几秒钟内)

升级时,我们没有预料到这种行为。而且,如果由于负载过重而添加了额外的服务器,这并没有多大帮助。它应该尽快启动并运行

感谢您的帮助

这是它加入之前的启动日志(2020-10-01T21:50:38.404+02:00):

也许又是这样-
[main] 2020-10-01T21:39:24.632+02:00 INFO [io.vertx.core.impl.launcher.commands.RunCommand]  Starting clustering...
[main] 2020-10-01T21:39:24.684+02:00 INFO [io.vertx.core.impl.launcher.commands.RunCommand]  No cluster-host specified so using address 10.0.88.53
[vert.x-worker-thread-0] 2020-10-01T21:39:25.82+02:00 INFO [com.hazelcast.instance.AddressPicker]  [LOCAL]  [3.12.9] Prefer IPv4 stack is true, prefer IPv6 addresses is false
[vert.x-worker-thread-0] 2020-10-01T21:39:25.826+02:00 INFO [com.hazelcast.instance.AddressPicker]  [LOCAL]  [3.12.9] Picked [10.0.88.53]:5701, using socket ServerSocket[addr=/0.0.0.0,localport=5701], bind any local is true
[vert.x-worker-thread-0] 2020-10-01T21:39:25.826+02:00 INFO [com.hazelcast.instance.AddressPicker]  [LOCAL]  [3.12.9] Using public address: [10.0.88.53]:5701
[vert.x-worker-thread-0] 2020-10-01T21:39:25.843+02:00 INFO [com.hazelcast.system]  [10.0.88.53]:5701  [3.12.9] Hazelcast 3.12.9 (20200819 - 3638e8b) starting at [10.0.88.53]:5701
[vert.x-worker-thread-0] 2020-10-01T21:39:25.843+02:00 INFO [com.hazelcast.system]  [10.0.88.53]:5701  [3.12.9] Copyright (c) 2008-2020, Hazelcast, Inc. All Rights Reserved.
[vert.x-worker-thread-0] 2020-10-01T21:39:26.109+02:00 INFO [com.hazelcast.spi.impl.operationservice.impl.BackpressureRegulator]  [10.0.88.53]:5701  [3.12.9] Backpressure is disabled
[vert.x-worker-thread-0] 2020-10-01T21:39:26.622+02:00 WARNING [com.hazelcast.aws.AwsDiscoveryStrategy]  Describe instances will be queried with iam-role assigned to EC2 instance, please make sure given iam-role have ec2:DescribeInstances policy attached.
[vert.x-worker-thread-0] 2020-10-01T21:39:26.647+02:00 WARNING [com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager]  [10.0.88.53]:5701  [3.12.9] When using Phi-Accrual Failure Detector, please consider using a lower 'hazelcast.max.no.heartbeat.seconds' value. Current is: 60 seconds.
[vert.x-worker-thread-0] 2020-10-01T21:39:26.775+02:00 INFO [com.hazelcast.instance.Node]  [10.0.88.53]:5701  [3.12.9] Activating Discovery SPI Joiner
[vert.x-worker-thread-0] 2020-10-01T21:39:27.049+02:00 INFO [com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl]  [10.0.88.53]:5701  [3.12.9] Starting 2 partition threads and 3 generic threads (1 dedicated for priority tasks)
[vert.x-worker-thread-0] 2020-10-01T21:39:27.054+02:00 INFO [com.hazelcast.internal.diagnostics.Diagnostics]  [10.0.88.53]:5701  [3.12.9] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
[vert.x-worker-thread-0] 2020-10-01T21:39:27.065+02:00 INFO [com.hazelcast.core.LifecycleService]  [10.0.88.53]:5701  [3.12.9] [10.0.88.53]:5701 is STARTING
[hz._hzInstance_1_ayton.IO.thread-in-0] 2020-10-01T21:39:27.11+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Connection[id=1, /10.0.88.53:5701->/10.0.89.28:5000, qualifier=null, endpoint=null, alive=false, type=NONE] closed. Reason: Connection closed by the other side
[hz._hzInstance_1_ayton.IO.thread-in-1] 2020-10-01T21:39:27.119+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Connection[id=2, /10.0.88.53:5701->/10.0.89.74:61352, qualifier=null, endpoint=null, alive=false, type=NONE] closed. Reason: Connection closed by the other side
[vert.x-worker-thread-0] 2020-10-01T21:39:27.135+02:00 INFO [com.hazelcast.internal.cluster.ClusterService]  [10.0.88.53]:5701  [3.12.9]
"Members {size:1, ver:1} [
    Member [10.0.88.53]:5701 - a3ee7e04-f292-4e8b-aaa8-dd84bd6c879f this"
"]
"
[vert.x-worker-thread-0] 2020-10-01T21:39:27.183+02:00 INFO [com.hazelcast.internal.jmx.ManagementService]  [10.0.88.53]:5701  [3.12.9] Hazelcast JMX agent enabled.
[vert.x-worker-thread-0] 2020-10-01T21:39:27.216+02:00 INFO [com.hazelcast.core.LifecycleService]  [10.0.88.53]:5701  [3.12.9] [10.0.88.53]:5701 is STARTED
[vert.x-worker-thread-3] 2020-10-01T21:39:27.522+02:00 INFO [com.hazelcast.internal.partition.impl.PartitionStateManager]  [10.0.88.53]:5701  [3.12.9] Initializing cluster partition table arrangement...
[vert.x-eventloop-thread-1] 2020-10-01T21:39:35.764+02:00 INFO [io.vertx.core.impl.launcher.commands.VertxIsolatedDeployer]  Succeeded in deploying worker verticle
[hz._hzInstance_1_ayton.IO.thread-in-2] 2020-10-01T21:39:36.473+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Connection[id=3, /10.0.88.53:5701->/10.0.89.28:5004, qualifier=null, endpoint=null, alive=false, type=NONE] closed. Reason: Connection closed by the other side
[hz._hzInstance_1_ayton.IO.thread-in-0] 2020-10-01T21:39:36.619+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Connection[id=4, /10.0.88.53:5701->/10.0.89.74:61360, qualifier=null, endpoint=null, alive=false, type=NONE] closed. Reason: Connection closed by the other side
[hz._hzInstance_1_ayton.IO.thread-in-1] 2020-10-01T21:39:46.472+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Connection[id=5, /10.0.88.53:5701->/10.0.89.28:5018, qualifier=null, endpoint=null, alive=false, type=NONE] closed. Reason: Connection closed by the other side
...
removed lines, just repeating the same over and over again...
...
[hz._hzInstance_1_ayton.IO.thread-in-0] 2020-10-01T21:50:16.637+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Connection[id=139, /10.0.88.53:5701->/10.0.89.74:62058, qualifier=null, endpoint=null, alive=false, type=NONE] closed. Reason: Connection closed by the other side
[hz._hzInstance_1_ayton.IO.thread-in-2] 2020-10-01T21:50:22.069+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Initialized new cluster connection between /10.0.88.53:55779 and /10.0.128.23:5701
[hz._hzInstance_1_ayton.IO.thread-in-1] 2020-10-01T21:50:26.49+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Connection[id=141, /10.0.88.53:5701->/10.0.89.28:5704, qualifier=null, endpoint=null, alive=false, type=NONE] closed. Reason: Connection closed by the other side
[hz._hzInstance_1_ayton.IO.thread-in-2] 2020-10-01T21:50:26.637+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Connection[id=142, /10.0.88.53:5701->/10.0.89.74:62070, qualifier=null, endpoint=null, alive=false, type=NONE] closed. Reason: Connection closed by the other side
[hz._hzInstance_1_ayton.IO.thread-in-2] 2020-10-01T21:50:36.49+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Connection[id=143, /10.0.88.53:5701->/10.0.89.28:5710, qualifier=null, endpoint=null, alive=false, type=NONE] closed. Reason: Connection closed by the other side
[hz._hzInstance_1_ayton.IO.thread-in-0] 2020-10-01T21:50:36.638+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Connection[id=144, /10.0.88.53:5701->/10.0.89.74:62086, qualifier=null, endpoint=null, alive=false, type=NONE] closed. Reason: Connection closed by the other side
[hz._hzInstance_1_ayton.IO.thread-in-1] 2020-10-01T21:50:37.267+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Initialized new cluster connection between /10.0.88.53:59871 and /10.0.128.77:5701
[hz._hzInstance_1_ayton.cached.thread-4] 2020-10-01T21:50:37.277+02:00 INFO [com.hazelcast.internal.cluster.impl.ClusterJoinManager]  [10.0.88.53]:5701  [3.12.9] We should merge to [10.0.128.77]:5701 because their data member count is bigger than ours [22 > 1]
[hz._hzInstance_1_ayton.cached.thread-4] 2020-10-01T21:50:37.277+02:00 WARNING [com.hazelcast.internal.cluster.impl.DiscoveryJoiner]  [10.0.88.53]:5701  [3.12.9] [10.0.88.53]:5701 is merging [tcp/ip] to [10.0.128.77]:5701
[hz._hzInstance_1_ayton.cached.thread-4] 2020-10-01T21:50:37.283+02:00 INFO [com.hazelcast.internal.cluster.impl.operations.LockClusterStateOp]  [10.0.88.53]:5701  [3.12.9] Locking cluster state. Initiator: [10.0.88.53]:5701, lease-time: 60000
[hz._hzInstance_1_ayton.cached.thread-4] 2020-10-01T21:50:37.284+02:00 INFO [com.hazelcast.internal.cluster.impl.operations.LockClusterStateOp]  [10.0.88.53]:5701  [3.12.9] Extending cluster state lock. Initiator: [10.0.88.53]:5701, lease-time: 20000
[hz._hzInstance_1_ayton.cached.thread-4] 2020-10-01T21:50:37.284+02:00 INFO [com.hazelcast.internal.cluster.impl.operations.CommitClusterStateOp]  [10.0.88.53]:5701  [3.12.9] Changing cluster state from ClusterState{state=ACTIVE, lock=LockGuard{lockOwner=[10.0.88.53]:5701, lockOwnerId='08fdbb39-f82a-450c-9b59-5a667d79de3d', lockExpiryTime=1601581917283}} to ClusterStateChange{type=class com.hazelcast.cluster.ClusterState, newState=FROZEN}, initiator: [10.0.88.53]:5701, transient: false
[hz._hzInstance_1_ayton.cached.thread-4] 2020-10-01T21:50:37.285+02:00 WARNING [com.hazelcast.internal.cluster.impl.operations.MergeClustersOp]  [10.0.88.53]:5701  [3.12.9] [10.0.88.53]:5701 is merging to [10.0.128.77]:5701, because: instructed by master [10.0.88.53]:5701
[hz._hzInstance_1_ayton.cached.thread-4] 2020-10-01T21:50:37.286+02:00 INFO [com.hazelcast.core.LifecycleService]  [10.0.88.53]:5701  [3.12.9] [10.0.88.53]:5701 is MERGING
[hz._hzInstance_1_ayton.cached.thread-4] 2020-10-01T21:50:37.289+02:00 WARNING [com.hazelcast.internal.cluster.ClusterService]  [10.0.88.53]:5701  [3.12.9] Resetting local member UUID. Previous: a3ee7e04-f292-4e8b-aaa8-dd84bd6c879f, new: 4dde9e5d-4cab-449f-85f4-93a858144a45
[hz._hzInstance_1_ayton.cached.thread-4] 2020-10-01T21:50:37.289+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Connection[id=92, /10.0.88.53:52493->/10.0.32.68:5701, qualifier=null, endpoint=[10.0.32.68]:5701, alive=false, type=MEMBER] closed. Reason: EndpointManager is stopping
[hz._hzInstance_1_ayton.cached.thread-4] 2020-10-01T21:50:37.29+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Connection[id=79, /10.0.88.53:41018->/10.0.24.49:5701, qualifier=null, endpoint=[10.0.24.49]:5701, alive=false, type=MEMBER] closed. Reason: EndpointManager is stopping
[hz._hzInstance_1_ayton.cached.thread-4] 2020-10-01T21:50:37.29+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Connection[id=65, /10.0.88.53:60034->/10.0.32.161:5701, qualifier=null, endpoint=[10.0.32.161]:5701, alive=false, type=MEMBER] closed. Reason: EndpointManager is stopping
[hz._hzInstance_1_ayton.cached.thread-4] 2020-10-01T21:50:37.291+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Connection[id=145, /10.0.88.53:59871->/10.0.128.77:5701, qualifier=null, endpoint=[10.0.128.77]:5701, alive=false, type=MEMBER] closed. Reason: EndpointManager is stopping
[hz._hzInstance_1_ayton.cached.thread-4] 2020-10-01T21:50:37.291+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Connection[id=95, /10.0.88.53:51695->/10.0.16.109:5701, qualifier=null, endpoint=[10.0.16.109]:5701, alive=false, type=MEMBER] closed. Reason: EndpointManager is stopping
[hz._hzInstance_1_ayton.cached.thread-4] 2020-10-01T21:50:37.291+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Connection[id=123, /10.0.88.53:49343->/10.0.24.73:5701, qualifier=null, endpoint=[10.0.24.73]:5701, alive=false, type=MEMBER] closed. Reason: EndpointManager is stopping
[hz._hzInstance_1_ayton.cached.thread-4] 2020-10-01T21:50:37.291+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Connection[id=102, /10.0.88.53:42646->/10.0.80.44:5701, qualifier=null, endpoint=[10.0.80.44]:5701, alive=false, type=MEMBER] closed. Reason: EndpointManager is stopping
[hz._hzInstance_1_ayton.cached.thread-4] 2020-10-01T21:50:37.291+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Connection[id=68, /10.0.88.53:57395->/10.0.128.107:5701, qualifier=null, endpoint=[10.0.128.107]:5701, alive=false, type=MEMBER] closed. Reason: EndpointManager is stopping
[hz._hzInstance_1_ayton.cached.thread-4] 2020-10-01T21:50:37.291+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Connection[id=140, /10.0.88.53:55779->/10.0.128.23:5701, qualifier=null, endpoint=[10.0.128.23]:5701, alive=false, type=MEMBER] closed. Reason: EndpointManager is stopping
[hz._hzInstance_1_ayton.cached.thread-6] 2020-10-01T21:50:37.367+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnector]  [10.0.88.53]:5701  [3.12.9] Connecting to /10.0.128.77:5701, timeout: 10000, bind-any: true
[hz._hzInstance_1_ayton.IO.thread-in-2] 2020-10-01T21:50:37.37+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Initialized new cluster connection between /10.0.88.53:36762 and /10.0.128.77:5701
[hz._hzInstance_1_ayton.IO.thread-in-0] 2020-10-01T21:50:38.375+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Initialized new cluster connection between /10.0.88.53:5701 and /10.0.80.44:48392
[hz._hzInstance_1_ayton.IO.thread-in-1] 2020-10-01T21:50:38.378+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Initialized new cluster connection between /10.0.88.53:5701 and /10.0.128.23:58847
[hz._hzInstance_1_ayton.IO.thread-in-2] 2020-10-01T21:50:38.392+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Initialized new cluster connection between /10.0.88.53:5701 and /10.0.128.107:44948
[hz._hzInstance_1_ayton.IO.thread-in-0] 2020-10-01T21:50:38.392+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Initialized new cluster connection between /10.0.88.53:5701 and /10.0.32.68:45484
[hz._hzInstance_1_ayton.IO.thread-in-2] 2020-10-01T21:50:38.403+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Initialized new cluster connection between /10.0.88.53:5701 and /10.0.32.161:41415
[hz._hzInstance_1_ayton.IO.thread-in-1] 2020-10-01T21:50:38.404+02:00 INFO [com.hazelcast.nio.tcp.TcpIpConnection]  [10.0.88.53]:5701  [3.12.9] Initialized new cluster connection between /10.0.88.53:5701 and /10.0.16.47:38166
[hz._hzInstance_1_ayton.priority-generic-operation.thread-0] 2020-10-01T21:50:38.418+02:00 INFO [com.hazelcast.internal.cluster.ClusterService]  [10.0.88.53]:5701  [3.12.9]
"Members {size:23, ver:65} [
    Member [10.0.128.77]:5701 - 48850de5-b76b-4f29-a09d-865a6e2485a8
    Member [10.0.96.21]:5701 - e6069c27-ea44-4b67-b9bf-925da710a4e1
    Member [10.0.48.36]:5701 - a1bfc076-02f8-4e2a-8923-a8c573838cab
    Member [10.0.96.87]:5701 - e47ad382-dfe6-4ff5-be40-ef04454aef18
    Member [10.0.64.188]:5701 - a5c0a154-0f79-4a30-9f57-5dda8cf91c91
    Member [10.0.24.73]:5701 - f1a1065c-d9ed-4218-a398-1eea010dcbca
    Member [10.0.24.164]:5701 - 84631ecd-3618-47e9-8dbb-3d37455ffb46
    Member [10.0.24.49]:5701 - fddc3f92-02b2-4617-a318-c3bbe1247e69
    Member [10.0.128.134]:5701 - 96b50f0a-315d-421d-ace7-9ce236211864
    Member [10.0.64.26]:5701 - 095d0b5e-0e92-45b7-a691-35857c65273b
    Member [10.0.48.116]:5701 - 7ad22571-d788-4107-a3d7-322ce9986dc0
    Member [10.0.104.173]:5701 - 7cf07c07-b218-4440-b93d-8091cef390cf
    Member [10.0.128.107]:5701 - f58a7ad5-f062-4751-bd9a-d52adf55a735
    Member [10.0.104.124]:5701 - 917854d2-42f8-42db-935d-39a837f5cc21
    Member [10.0.80.44]:5701 - 84d66d93-0308-4dc5-ac57-3b5d614f0aca
    Member [10.0.80.175]:5701 - b3aa5fcf-f92b-4723-a1c7-c354f2cf1713
    Member [10.0.32.68]:5701 - 64ada680-c58b-4340-8776-1b7cdaf1e2b7
    Member [10.0.128.23]:5701 - d3f8951e-9530-4824-999f-1636a66cda8b
    Member [10.0.32.161]:5701 - f729ae0c-a610-41c3-92ff-20c68996c1b4
    Member [10.0.16.109]:5701 - 1993e66b-12b4-4cd4-8541-dcbbbf6b5ecf
    Member [10.0.88.90]:5701 - aba4bde6-96de-41ff-9413-da8e35dde3ac
    Member [10.0.16.47]:5701 - ef0f8d1f-edbd-4802-bb9c-52c102c74b8c
    Member [10.0.88.53]:5701 - 4dde9e5d-4cab-449f-85f4-93a858144a45 this"
"]
"