Apache zookeeper 分布式状态机&x27;s zookeeper集成在处理并行区域时失败,错误为KeepErrorCode=BadVersion

Apache zookeeper 分布式状态机&x27;s zookeeper集成在处理并行区域时失败,错误为KeepErrorCode=BadVersion,apache-zookeeper,spring-statemachine,apache-curator,Apache Zookeeper,Spring Statemachine,Apache Curator,背景: 图表: 我们有一个正常的状态机,如图所示,它为每个启动的批监控SpringBatch微服务(部署在streams源/处理器/接收器设计上) 我们接收REST调用序列,以在各个批次的机器对象上的每个批次id内部触发事件。i、 e.根据批次id创建新的状态机对象 并且每台机器都有n个并行区域(表示spring批的块),如图所示 REST调用使用多线程环境,在该环境中,对于BATCHPROCESSING状态的不同区域ID,可能会同时调用两个相同的batchId 到目前为止,我们只有一个节点(

背景:

图表:

我们有一个正常的状态机,如图所示,它为每个启动的批监控SpringBatch微服务(部署在streams源/处理器/接收器设计上)

我们接收REST调用序列,以在各个批次的机器对象上的每个批次id内部触发事件。i、 e.根据批次id创建新的状态机对象

并且每台机器都有
n
个并行区域(表示spring批的块),如图所示

REST调用使用多线程环境,在该环境中,对于BATCHPROCESSING状态的不同区域ID,可能会同时调用两个相同的batchId

到目前为止,我们只有一个节点(单个安装)运行这个状态机微服务,但现在我们希望将其部署到多个实例上;接听休息电话。 为此,我们要介绍的是分布式状态机。我们为运行分布式状态机准备了以下配置

@Configuration
@EnableStateMachine
public  class StateMachineUMLWayConfiguration extends 
StateMachineConfigurerAdapter<String, String> {

..
..

@Override
public void configure(StateMachineModelConfigurer<String,String> model) 
throws Exception {
    model
        .withModel()
            .factory(stateMachineModelFactory());
}

@Bean
public StateMachineModelFactory<String,String> stateMachineModelFactory() {

    StorehubBatchUmlStateMachineModelFactory factory =null;

    try {
    factory = new StorehubBatchUmlStateMachineModelFactory
    (templateUMLInClasspath,stateMachineEnsemble());
    } catch (Exception e) {
    LOGGER.info("Config's State machine factory got exception 
    :"+factory);
    }
    LOGGER.info("Config's State machine factory method Called:"+factory);

factory.setStateMachineComponentResolver(stateMachineComponentResolver());
    return factory;
}


    @Override
    public void configure(StateMachineConfigurationConfigurer<String, 
String> 
    config) throws Exception {
    config
        .withDistributed()
            .ensemble(stateMachineEnsemble());
}

@Bean
public StateMachineEnsemble<String, String> stateMachineEnsemble() throws 
Exception {
    return new ZookeeperStateMachineEnsemble<String, String>(curatorClient(), "/batchfoo1", true, 512);
}

@Bean
    public CuratorFramework curatorClient() throws Exception {
        CuratorFramework client = 
CuratorFrameworkFactory.builder().defaultData(new byte[0])
                .retryPolicy(new ExponentialBackoffRetry(1000, 3))
                .connectString("localhost:2181").build();
        client.start();
        return client;
    }
问题:

现在的问题是,部署在单个实例上的微服务即使在它接收到的事件来自多线程环境时也能成功运行,其中一个线程使用属于区域1的事件REST调用命中,而另一个线程同时来自同一批的区域2。机器同步前进,并行区域处理成功,直到最后一个状态,即完成。 我们还在zookeeper端检查了BATCHCOMPLETED状态是否最终被记录在节点的当前版本中

但是,除了第一个实例之外,当我们将同一个微服务应用程序jar部署在其他位置,将其视为微服务的第二个实例,该实例现在也在运行,以接受事件REST调用(比如通过侦听另一个tomcat端口9002);它在中间某个地方随机失效。在触发并行区域中的任何一个事件之后,以及在该事件的状态发生更改时在内部调用
ensemble.setState()
时,此故障会随机发生

它给出了以下错误:

    [36mo.s.s.support.AbstractStateMachine      [0;39m [2m:[0;39m Interceptors threw exception, skipping state change

org.springframework.statemachine.StateMachineException: Error persisting data; nested exception is org.springframework.statemachine.StateMachineException: Error persisting data; nested exception is org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion
        at org.springframework.statemachine.zookeeper.ZookeeperStateMachineEnsemble.setState(ZookeeperStateMachineEnsemble.java:241) ~[spring-statemachine-zookeeper-2.0.1.RELEASE.jar!/:2.0.1.RELEASE]
        at org.springframework.statemachine.ensemble.DistributedStateMachine$LocalStateMachineInterceptor.preStateChange(DistributedStateMachine.java:209) ~[spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.StateMachineInterceptorList.preStateChange(StateMachineInterceptorList.java:101) ~[spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.AbstractStateMachine.callPreStateChangeInterceptors(AbstractStateMachine.java:859) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.AbstractStateMachine.switchToState(AbstractStateMachine.java:880) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.AbstractStateMachine.access$500(AbstractStateMachine.java:81) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.AbstractStateMachine$3.transit(AbstractStateMachine.java:335) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.DefaultStateMachineExecutor.handleTriggerTrans(DefaultStateMachineExecutor.java:286) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.DefaultStateMachineExecutor.handleTriggerTrans(DefaultStateMachineExecutor.java:211) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.DefaultStateMachineExecutor.processTriggerQueue(DefaultStateMachineExecutor.java:449) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.DefaultStateMachineExecutor.access$200(DefaultStateMachineExecutor.java:65) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.DefaultStateMachineExecutor$1.run(DefaultStateMachineExecutor.java:323) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50) [spring-core-4.3.13.RELEASE.jar!/:4.3.13.RELEASE]
        at org.springframework.statemachine.support.DefaultStateMachineExecutor.scheduleEventQueueProcessing(DefaultStateMachineExecutor.java:352) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.DefaultStateMachineExecutor.execute(DefaultStateMachineExecutor.java:163) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.AbstractStateMachine.sendEventInternal(AbstractStateMachine.java:603) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.AbstractStateMachine.sendEvent(AbstractStateMachine.java:218) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.ensemble.DistributedStateMachine.sendEvent(DistributedStateMachine.java:108) 
..skipping Lines....
Caused by: org.springframework.statemachine.StateMachineException: Error persisting data; nested exception is org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion
    at org.springframework.statemachine.zookeeper.ZookeeperStateMachinePersist.write(ZookeeperStateMachinePersist.java:113) ~[spring-statemachine-zookeeper-2.0.1.RELEASE.jar!/:2.0.1.RELEASE]
    at org.springframework.statemachine.zookeeper.ZookeeperStateMachinePersist.write(ZookeeperStateMachinePersist.java:50) ~[spring-statemachine-zookeeper-2.0.1.RELEASE.jar!/:2.0.1.RELEASE]
    at org.springframework.statemachine.zookeeper.ZookeeperStateMachineEnsemble.setState(ZookeeperStateMachineEnsemble.java:235) ~[spring-statemachine-zookeeper-2.0.1.RELEASE.jar!/:2.0.1.RELEASE]
    ... 73 common frames omitted
Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion
at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) ~[zookeeper-3.4.8.jar!/:3.4.8--1]
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1006) ~[zookeeper-3.4.8.jar!/:3.4.8--1]
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910) ~[zookeeper-3.4.8.jar!/:3.4.8--1]
at org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
问题:
1.那么,上述配置是否还需要配置其他配置,以避免上述异常情况?? 因为这两个状态机微服务实例都是在连接到同一个实例(即相同的字符串
.connectString(“localhost:2181”).build()
)或连接到不同zookeeper实例(即“localhost:2181”、“localhost:2182”)时使用case进行测试的

在这两种情况下,在状态机集成的处理过程中都会发生相同的坏版本异常

2.如果批处理将并行运行,则需要创建它们各自的计算机以在状态机微服务端并行运行。 所以在这里,我们需要一个新的状态机来创建新的batchId,同时运行。 但看看ZookePerstateMachineAssemble,只要在主配置类(“StateMachineUMLWayConfiguration”)中实例化一次集成对象,一个znode路径似乎与一个集成关联

那么它是否只希望使用单例集成实例?不能在运行时创建多个集合,引用并行运行的不同znode路径,以将各自的分布式状态机状态记录到各自的znode路径中吗??

a.因为并行运行的批需要创建单独的znode路径。因此,由于我们试图在每个批中保持单独的znode路径,我们需要在每个批的机器上实例化单独的集合。但在通过策展人客户端连接到znode时,似乎进入了锁定状态

b.为事件触发而触发的REST调用未完成,因为它获取的机器卡在集成中无法连接


提前谢谢。

我对Spring一无所知,但出现异常的原因是使用不正确的版本号调用ZooKeeper的
设置数据。通常,另一个进程已经更新了您正在更新的ZNode。@Randgalt:谢谢您的回复。是的,机器上下文中的另一个进程是另一个
集成
正在运行的进程,它调用了
setData
方法在
zookeeper
更新。然而,在这里,我认为在尝试更新特定节点路径之前,管理版本错配并在其他集成实例之间重新同步版本错配应该是集成的角色。
@Override
public StateMachine<String, String> acquireDistributedStateMachine(String machineId, boolean start) {

    synchronized (distributedMachines) {
        DistributedStateMachine<String,String> distributedStateMachine = distributedMachines.get(machineId); 
        StateMachine<String,String> distMachineDelegateX = null;
        if (distributedStateMachine == null) { 

            StateMachine<String, String> machine = stateMachineFactory.getStateMachine(machineId);
            distributedStateMachine = (DistributedStateMachine<String, String>) machine;

        }
        distributedMachines.put(machineId, distributedStateMachine);

        return handleStart(distributedStateMachine, start);
    }
}
    [36mo.s.s.support.AbstractStateMachine      [0;39m [2m:[0;39m Interceptors threw exception, skipping state change

org.springframework.statemachine.StateMachineException: Error persisting data; nested exception is org.springframework.statemachine.StateMachineException: Error persisting data; nested exception is org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion
        at org.springframework.statemachine.zookeeper.ZookeeperStateMachineEnsemble.setState(ZookeeperStateMachineEnsemble.java:241) ~[spring-statemachine-zookeeper-2.0.1.RELEASE.jar!/:2.0.1.RELEASE]
        at org.springframework.statemachine.ensemble.DistributedStateMachine$LocalStateMachineInterceptor.preStateChange(DistributedStateMachine.java:209) ~[spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.StateMachineInterceptorList.preStateChange(StateMachineInterceptorList.java:101) ~[spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.AbstractStateMachine.callPreStateChangeInterceptors(AbstractStateMachine.java:859) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.AbstractStateMachine.switchToState(AbstractStateMachine.java:880) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.AbstractStateMachine.access$500(AbstractStateMachine.java:81) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.AbstractStateMachine$3.transit(AbstractStateMachine.java:335) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.DefaultStateMachineExecutor.handleTriggerTrans(DefaultStateMachineExecutor.java:286) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.DefaultStateMachineExecutor.handleTriggerTrans(DefaultStateMachineExecutor.java:211) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.DefaultStateMachineExecutor.processTriggerQueue(DefaultStateMachineExecutor.java:449) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.DefaultStateMachineExecutor.access$200(DefaultStateMachineExecutor.java:65) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.DefaultStateMachineExecutor$1.run(DefaultStateMachineExecutor.java:323) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50) [spring-core-4.3.13.RELEASE.jar!/:4.3.13.RELEASE]
        at org.springframework.statemachine.support.DefaultStateMachineExecutor.scheduleEventQueueProcessing(DefaultStateMachineExecutor.java:352) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.DefaultStateMachineExecutor.execute(DefaultStateMachineExecutor.java:163) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.AbstractStateMachine.sendEventInternal(AbstractStateMachine.java:603) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.support.AbstractStateMachine.sendEvent(AbstractStateMachine.java:218) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
        at org.springframework.statemachine.ensemble.DistributedStateMachine.sendEvent(DistributedStateMachine.java:108) 
..skipping Lines....
Caused by: org.springframework.statemachine.StateMachineException: Error persisting data; nested exception is org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion
    at org.springframework.statemachine.zookeeper.ZookeeperStateMachinePersist.write(ZookeeperStateMachinePersist.java:113) ~[spring-statemachine-zookeeper-2.0.1.RELEASE.jar!/:2.0.1.RELEASE]
    at org.springframework.statemachine.zookeeper.ZookeeperStateMachinePersist.write(ZookeeperStateMachinePersist.java:50) ~[spring-statemachine-zookeeper-2.0.1.RELEASE.jar!/:2.0.1.RELEASE]
    at org.springframework.statemachine.zookeeper.ZookeeperStateMachineEnsemble.setState(ZookeeperStateMachineEnsemble.java:235) ~[spring-statemachine-zookeeper-2.0.1.RELEASE.jar!/:2.0.1.RELEASE]
    ... 73 common frames omitted
Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion
at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) ~[zookeeper-3.4.8.jar!/:3.4.8--1]
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1006) ~[zookeeper-3.4.8.jar!/:3.4.8--1]
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910) ~[zookeeper-3.4.8.jar!/:3.4.8--1]
at org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)