Spring批处理管理远程分区步骤最多运行8个线程,即使并发度为10?

Spring批处理管理远程分区步骤最多运行8个线程,即使并发度为10?,spring,spring-batch,spring-integration,spring-batch-admin,Spring,Spring Batch,Spring Integration,Spring Batch Admin,我正在为批处理使用spring批处理远程分区。我正在使用spring批处理管理启动作业 我将入站网关使用者并发步骤设置为10,但并行运行的分区的最大数量为8 我希望稍后将使用者并发性增加到15 下面是我的配置 <task:executor id="taskExecutor" pool-size="50" /> <rabbit:template id="computeAmqpTemplate" connection-factory="rabbitConnectionFa

我正在为批处理使用spring批处理远程分区。我正在使用spring批处理管理启动作业

我将入站网关使用者并发步骤设置为10,但并行运行的分区的最大数量为8

我希望稍后将使用者并发性增加到15

下面是我的配置

<task:executor id="taskExecutor" pool-size="50" />

<rabbit:template id="computeAmqpTemplate"
    connection-factory="rabbitConnectionFactory" routing-key="computeQueue"
    reply-timeout="${compute.partition.timeout}">
</rabbit:template>

<int:channel id="computeOutboundChannel">
    <int:dispatcher task-executor="taskExecutor" />
</int:channel>

<int:channel id="computeInboundStagingChannel" />

<amqp:outbound-gateway request-channel="computeOutboundChannel"
    reply-channel="computeInboundStagingChannel" amqp-template="computeAmqpTemplate"
    mapped-request-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS"
    mapped-reply-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS" />


<beans:bean id="computeMessagingTemplate"
    class="org.springframework.integration.core.MessagingTemplate"
    p:defaultChannel-ref="computeOutboundChannel"
    p:receiveTimeout="${compute.partition.timeout}" />


<beans:bean id="computePartitionHandler"
    class="org.springframework.batch.integration.partition.MessageChannelPartitionHandler"
    p:stepName="computeStep" p:gridSize="${compute.grid.size}"
    p:messagingOperations-ref="computeMessagingTemplate" />

<int:aggregator ref="computePartitionHandler"
    send-partial-result-on-expiry="true" send-timeout="${compute.step.timeout}"
    input-channel="computeInboundStagingChannel" />

<amqp:inbound-gateway concurrent-consumers="${compute.consumer.concurrency}"
    request-channel="computeInboundChannel" 
    reply-channel="computeOutboundStagingChannel" queue-names="computeQueue"
    connection-factory="rabbitConnectionFactory"
    mapped-request-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS"
    mapped-reply-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS" />


<int:channel id="computeInboundChannel" />

<int:service-activator ref="stepExecutionRequestHandler"
    input-channel="computeInboundChannel" output-channel="computeOutboundStagingChannel" />

<int:channel id="computeOutboundStagingChannel" />

<beans:bean id="computePartitioner"
    class="org.springframework.batch.core.partition.support.MultiResourcePartitioner"
    p:resources="file:${spring.tmp.batch.dir}/#{jobParameters[batch_id]}/shares_rics/shares_rics_*.txt"
    scope="step" />



<beans:bean id="computeFileItemReader"
    class="org.springframework.batch.item.file.FlatFileItemReader"
    p:resource="#{stepExecutionContext[fileName]}" p:lineMapper-ref="stLineMapper"
    scope="step" />

<beans:bean id="computeItemWriter"
    class="com.st.batch.foundation.writers.ComputeItemWriter"
    p:symfony-ref="symfonyStepScoped" p:timeout="${compute.item.timeout}"
    p:batchId="#{jobParameters[batch_id]}" scope="step" />


<step id="computeStep">
    <tasklet transaction-manager="transactionManager">
        <chunk reader="computeFileItemReader" writer="computeItemWriter"
            commit-interval="${compute.commit.interval}" />
    </tasklet>
</step>

<flow id="computeFlow">
    <step id="computeStep.master">
        <partition partitioner="computePartitioner"
            handler="computePartitionHandler" />
    </step>
</flow>

<job id="computeJob" restartable="true">
    <flow id="computeJob.computeFlow" parent="computeFlow" />
</job>



compute.grid.size = 112
compute.consumer.concurrency = 10

Input files are splited to 112 equal parts = compute.grid.size = total number of partitions

Number of servers = 4.
<bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
    <property name="jobRepository" ref="jobRepository" />
    <property name="taskExecutor" ref="jobLauncherTaskExecutor" />
</bean>

<task:executor id="jobLauncherTaskExecutor" pool-size="6" rejection-policy="ABORT" />

compute.grid.size=112
compute.consumer.concurrency=10
输入文件被分成112个相等的部分=compute.grid.size=分区总数
服务器数量=4。
有两个问题,

i) 即使我已将并发设置为10,但运行的最大线程数为8

(ii)

有些进程运行得比较慢,而有些进程运行得比较快,所以我希望确保步骤执行是公平分布的,即,如果执行速度较快的服务器完成了它们的执行,队列中其他剩余的执行应该转到它们那里。它不应该以时尚的方式分发给每个人

我知道在rabbitmq中,有预取计数设置和ack模式来分配数据。对于spring集成,预取计数默认为1,确认模式默认为自动。但是仍然有一些服务器继续运行更多的分区,即使其他服务器已经运行了很长时间。理想情况下,服务器不应处于空闲状态

更新:

我现在观察到的另一件事是,对于一些使用split并行运行的步骤(不是使用远程分区分发的),也并行运行max 8。看起来有点像线程池限制问题,但正如您所看到的,taskExecutor将池大小设置为50

spring batch/spring batch admin中是否有限制并发运行步骤数量的内容

第二次更新:

而且,如果有8个或更多线程在并行处理项目中运行,spring batch admin不会加载。它只是挂着。如果我降低并发性,spring批处理管理将加载。我甚至在一台服务器上设置了并发4,在另一台服务器上设置了并发8,spring batch admin没有加载它,我使用了运行8个线程的服务器的URL,但它在运行4个线程的服务器上工作

Spring batch admin manager具有以下jobLauncher配置:

<task:executor id="taskExecutor" pool-size="50" />

<rabbit:template id="computeAmqpTemplate"
    connection-factory="rabbitConnectionFactory" routing-key="computeQueue"
    reply-timeout="${compute.partition.timeout}">
</rabbit:template>

<int:channel id="computeOutboundChannel">
    <int:dispatcher task-executor="taskExecutor" />
</int:channel>

<int:channel id="computeInboundStagingChannel" />

<amqp:outbound-gateway request-channel="computeOutboundChannel"
    reply-channel="computeInboundStagingChannel" amqp-template="computeAmqpTemplate"
    mapped-request-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS"
    mapped-reply-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS" />


<beans:bean id="computeMessagingTemplate"
    class="org.springframework.integration.core.MessagingTemplate"
    p:defaultChannel-ref="computeOutboundChannel"
    p:receiveTimeout="${compute.partition.timeout}" />


<beans:bean id="computePartitionHandler"
    class="org.springframework.batch.integration.partition.MessageChannelPartitionHandler"
    p:stepName="computeStep" p:gridSize="${compute.grid.size}"
    p:messagingOperations-ref="computeMessagingTemplate" />

<int:aggregator ref="computePartitionHandler"
    send-partial-result-on-expiry="true" send-timeout="${compute.step.timeout}"
    input-channel="computeInboundStagingChannel" />

<amqp:inbound-gateway concurrent-consumers="${compute.consumer.concurrency}"
    request-channel="computeInboundChannel" 
    reply-channel="computeOutboundStagingChannel" queue-names="computeQueue"
    connection-factory="rabbitConnectionFactory"
    mapped-request-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS"
    mapped-reply-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS" />


<int:channel id="computeInboundChannel" />

<int:service-activator ref="stepExecutionRequestHandler"
    input-channel="computeInboundChannel" output-channel="computeOutboundStagingChannel" />

<int:channel id="computeOutboundStagingChannel" />

<beans:bean id="computePartitioner"
    class="org.springframework.batch.core.partition.support.MultiResourcePartitioner"
    p:resources="file:${spring.tmp.batch.dir}/#{jobParameters[batch_id]}/shares_rics/shares_rics_*.txt"
    scope="step" />



<beans:bean id="computeFileItemReader"
    class="org.springframework.batch.item.file.FlatFileItemReader"
    p:resource="#{stepExecutionContext[fileName]}" p:lineMapper-ref="stLineMapper"
    scope="step" />

<beans:bean id="computeItemWriter"
    class="com.st.batch.foundation.writers.ComputeItemWriter"
    p:symfony-ref="symfonyStepScoped" p:timeout="${compute.item.timeout}"
    p:batchId="#{jobParameters[batch_id]}" scope="step" />


<step id="computeStep">
    <tasklet transaction-manager="transactionManager">
        <chunk reader="computeFileItemReader" writer="computeItemWriter"
            commit-interval="${compute.commit.interval}" />
    </tasklet>
</step>

<flow id="computeFlow">
    <step id="computeStep.master">
        <partition partitioner="computePartitioner"
            handler="computePartitionHandler" />
    </step>
</flow>

<job id="computeJob" restartable="true">
    <flow id="computeJob.computeFlow" parent="computeFlow" />
</job>



compute.grid.size = 112
compute.consumer.concurrency = 10

Input files are splited to 112 equal parts = compute.grid.size = total number of partitions

Number of servers = 4.
<bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
    <property name="jobRepository" ref="jobRepository" />
    <property name="taskExecutor" ref="jobLauncherTaskExecutor" />
</bean>

<task:executor id="jobLauncherTaskExecutor" pool-size="6" rejection-policy="ABORT" />

那里的游泳池大小是6,这与上述问题有关吗

或者tomcat 7中是否存在将运行的线程数限制为8的情况?

感到困惑-您说“我已将并发设置为10”,但随后显示
compute.consumer.concurrency=8
。因此,它正在按配置工作。如果属性设置为10,则不可能只有8个使用者线程

从Rabbit的角度来看,所有的消费者都是平等的——如果在一个慢盒子上有10个消费者,在一个快盒子上有10个消费者,而您只有10个分区,那么所有10个分区都有可能在慢盒子上结束

RabbitMQ不跨服务器分发工作,它只跨消费者分发工作


通过减少并发性,您可能会获得更好的分发。您还应该在较慢的框中设置较低的并发性。

您是否使用数据库作为JobRepository

在执行过程中,批处理框架会持续执行步骤执行,而到JobRepository数据库的连接数可能会干扰并行步骤执行


8的并发性使我认为您可能正在使用
BasicDataSource
?如果是这样,请切换到类似于
DriverManagerDataSource
的内容,并参阅

对不起,有问题。我的配置中的值实际上是10。是的,我确实将并发性设置得更低。但理想情况下,如果较慢服务器上的用户正忙着,则其他服务器上的用户应接收消息。看起来,较慢服务器上的用户会不断收到消息,即使较快服务器上的用户空闲地坐着,更新问题中有更多信息,spring批处理/spring批处理管理问题如果
并发使用者
为10,则将有10个线程。时期SI/SB中没有将其限制为8的内容。正如我所说,Rabbit不知道下一个消费者是在繁忙还是空闲的服务器上。听起来你有太多的消费者来满足你的需求。如果一些批作业需要比其他分区更多的分区,考虑使用不同的配置。理想情况下,它应该运行10,但它不运行。我又补充了一个观察。嗨,维沙,我也有同样的问题。这个问题解决了吗?如果是这样的话,我可以知道你的解决方案吗。。。。。