hazelcast ScheduledExecutorService在节点关闭后丢失任务
我正在尝试使用hazelcast ScheduledExecutorService执行一些定期任务。我使用的是hazelcast 3.8.1 我先启动一个节点,然后启动另一个节点,任务在两个节点之间分配并正确执行 如果关闭第一个节点,那么第二个节点将开始执行以前在第一个节点上执行的定期任务 问题是,如果我停止第二个节点而不是第一个节点,那么它的任务就不会被重新调度到第一个节点。即使有更多节点,也会发生这种情况。如果关闭最后一个接收任务的节点,则这些任务将丢失 关机操作始终使用ctrl+c完成 我创建了一个测试应用程序,其中包含来自hazelcast示例的一些示例代码,以及我在web上找到的一些代码片段。我启动此应用程序的两个实例hazelcast ScheduledExecutorService在节点关闭后丢失任务,hazelcast,Hazelcast,我正在尝试使用hazelcast ScheduledExecutorService执行一些定期任务。我使用的是hazelcast 3.8.1 我先启动一个节点,然后启动另一个节点,任务在两个节点之间分配并正确执行 如果关闭第一个节点,那么第二个节点将开始执行以前在第一个节点上执行的定期任务 问题是,如果我停止第二个节点而不是第一个节点,那么它的任务就不会被重新调度到第一个节点。即使有更多节点,也会发生这种情况。如果关闭最后一个接收任务的节点,则这些任务将丢失 关机操作始终使用ctrl+c完成 我
public class MasterMember {
/**
* The constant LOG.
*/
final static Logger logger = LoggerFactory.getLogger(MasterMember.class);
public static void main(String[] args) throws Exception {
Config config = new Config();
config.setProperty("hazelcast.logging.type", "slf4j");
config.getScheduledExecutorConfig("scheduler").
setPoolSize(16).setCapacity(100).setDurability(1);
final HazelcastInstance instance = Hazelcast.newHazelcastInstance(config);
Runtime.getRuntime().addShutdownHook(new Thread() {
HazelcastInstance threadInstance = instance;
@Override
public void run() {
logger.info("Application shutdown");
for (int i = 0; i < 12; i++) {
logger.info("Verifying whether it is safe to close this instance");
boolean isSafe = getResultsForAllInstances(hzi -> {
if (hzi.getLifecycleService().isRunning()) {
return hzi.getPartitionService().forceLocalMemberToBeSafe(10, TimeUnit.SECONDS);
}
return true;
});
if (isSafe) {
logger.info("Verifying whether cluster is safe.");
isSafe = getResultsForAllInstances(hzi -> {
if (hzi.getLifecycleService().isRunning()) {
return hzi.getPartitionService().isClusterSafe();
}
return true;
});
if (isSafe) {
System.out.println("is safe.");
break;
}
}
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
threadInstance.shutdown();
}
private boolean getResultsForAllInstances(
Function<HazelcastInstance, Boolean> hazelcastInstanceBooleanFunction) {
return Hazelcast.getAllHazelcastInstances().stream().map(hazelcastInstanceBooleanFunction).reduce(true,
(old, next) -> old && next);
}
});
new Thread(() -> {
try {
Thread.sleep(10000);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
IScheduledExecutorService scheduler = instance.getScheduledExecutorService("scheduler");
scheduler.scheduleAtFixedRate(named("1", new EchoTask("1")), 5, 10, TimeUnit.SECONDS);
scheduler.scheduleAtFixedRate(named("2", new EchoTask("2")), 5, 10, TimeUnit.SECONDS);
scheduler.scheduleAtFixedRate(named("3", new EchoTask("3")), 5, 10, TimeUnit.SECONDS);
scheduler.scheduleAtFixedRate(named("4", new EchoTask("4")), 5, 10, TimeUnit.SECONDS);
scheduler.scheduleAtFixedRate(named("5", new EchoTask("5")), 5, 10, TimeUnit.SECONDS);
scheduler.scheduleAtFixedRate(named("6", new EchoTask("6")), 5, 10, TimeUnit.SECONDS);
}).start();
new Thread(() -> {
try {
// delays init
Thread.sleep(20000);
while (true) {
IScheduledExecutorService scheduler = instance.getScheduledExecutorService("scheduler");
final Map<Member, List<IScheduledFuture<Object>>> allScheduledFutures =
scheduler.getAllScheduledFutures();
// check if the subscription already exists as a task, if so, stop it
for (final List<IScheduledFuture<Object>> entry : allScheduledFutures.values()) {
for (final IScheduledFuture<Object> objectIScheduledFuture : entry) {
logger.info(
"TaskStats: name {} isDone() {} isCanceled() {} total runs {} delay (sec) {} other statistics {} ",
objectIScheduledFuture.getHandler().getTaskName(), objectIScheduledFuture.isDone(),
objectIScheduledFuture.isCancelled(),
objectIScheduledFuture.getStats().getTotalRuns(),
objectIScheduledFuture.getDelay(TimeUnit.SECONDS),
objectIScheduledFuture.getStats());
}
}
Thread.sleep(15000);
}
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}).start();
while (true) {
Thread.sleep(1000);
}
// Hazelcast.shutdownAll();
}
}
我做错什么了
提前谢谢
--编辑-- 修改(并在上面更新)代码以使用log而不是system.out。添加了任务统计的日志记录和配置对象的固定用法 日志:
忘了提到,在启动第二个节点之前,我会等到所有任务都在第一个节点上运行。我可以通过更改hazelcast项目的ScheduledExecutorContainer类(使用3.8.1源代码),即PromotTestash()方法,快速修复此问题。基本上,我已经为之前的数据迁移任务被取消的情况添加了一个条件。 我现在不知道这种改变可能产生的副作用,或者这是最好的方法
void promoteStash() {
for (ScheduledTaskDescriptor descriptor : tasks.values()) {
try {
if (logger.isFinestEnabled()) {
logger.finest("[Partition: " + partitionId + "] " + "Attempt to promote stashed " + descriptor);
}
if (descriptor.shouldSchedule()) {
doSchedule(descriptor);
} else if (descriptor.getTaskResult() != null && descriptor.getTaskResult().isCancelled()
&& descriptor.getScheduledFuture() == null) {
// tasks that were already present in this node, once they get sent back to this node, since they
// have been cancelled when migrating the task to other node, are not rescheduled...
logger.fine("[Partition: " + partitionId + "] " + "Attempt to promote stashed canceled task "
+ descriptor);
descriptor.setTaskResult(null);
doSchedule(descriptor);
}
descriptor.setTaskOwner(true);
} catch (Exception e) {
throw rethrow(e);
}
}
}
布鲁诺,谢谢你的报道,这真是一个bug。不幸的是,对于多个节点,它并不像对于两个节点那样明显。正如您在回答中所想的那样,它不会丢失任务,而是在迁移后将其取消。但是,您的修复是不安全的,因为任务可以被取消,同时具有空的未来,例如,当您取消主副本时,没有未来的备份只会得到结果。修复程序与您所做的非常接近,因此在
prepareForReplication()
中,当处于migrationMode
时,我们避免设置结果。我将很快推出一个解决方案,只需再运行几个测试。这将在主版本和更高版本中提供
我记录了一个关于你的发现的问题,如果你不介意的话,你可以在那里跟踪它的状态 多谢各位。这个补丁会在3.9发布时可用吗?是否有发布的估计日期?将返回到3.8.3,但两个版本都没有日期。他们两人都很快就要结婚了。
void promoteStash() {
for (ScheduledTaskDescriptor descriptor : tasks.values()) {
try {
if (logger.isFinestEnabled()) {
logger.finest("[Partition: " + partitionId + "] " + "Attempt to promote stashed " + descriptor);
}
if (descriptor.shouldSchedule()) {
doSchedule(descriptor);
} else if (descriptor.getTaskResult() != null && descriptor.getTaskResult().isCancelled()
&& descriptor.getScheduledFuture() == null) {
// tasks that were already present in this node, once they get sent back to this node, since they
// have been cancelled when migrating the task to other node, are not rescheduled...
logger.fine("[Partition: " + partitionId + "] " + "Attempt to promote stashed canceled task "
+ descriptor);
descriptor.setTaskResult(null);
doSchedule(descriptor);
}
descriptor.setTaskOwner(true);
} catch (Exception e) {
throw rethrow(e);
}
}
}