Spring批处理管理远程分区步骤最多运行8个线程,即使并发度为10?
我正在为批处理使用spring批处理远程分区。我正在使用spring批处理管理启动作业 我将入站网关使用者并发步骤设置为10,但并行运行的分区的最大数量为8 我希望稍后将使用者并发性增加到15 下面是我的配置Spring批处理管理远程分区步骤最多运行8个线程,即使并发度为10?,spring,spring-batch,spring-integration,spring-batch-admin,Spring,Spring Batch,Spring Integration,Spring Batch Admin,我正在为批处理使用spring批处理远程分区。我正在使用spring批处理管理启动作业 我将入站网关使用者并发步骤设置为10,但并行运行的分区的最大数量为8 我希望稍后将使用者并发性增加到15 下面是我的配置 <task:executor id="taskExecutor" pool-size="50" /> <rabbit:template id="computeAmqpTemplate" connection-factory="rabbitConnectionFa
<task:executor id="taskExecutor" pool-size="50" />
<rabbit:template id="computeAmqpTemplate"
connection-factory="rabbitConnectionFactory" routing-key="computeQueue"
reply-timeout="${compute.partition.timeout}">
</rabbit:template>
<int:channel id="computeOutboundChannel">
<int:dispatcher task-executor="taskExecutor" />
</int:channel>
<int:channel id="computeInboundStagingChannel" />
<amqp:outbound-gateway request-channel="computeOutboundChannel"
reply-channel="computeInboundStagingChannel" amqp-template="computeAmqpTemplate"
mapped-request-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS"
mapped-reply-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS" />
<beans:bean id="computeMessagingTemplate"
class="org.springframework.integration.core.MessagingTemplate"
p:defaultChannel-ref="computeOutboundChannel"
p:receiveTimeout="${compute.partition.timeout}" />
<beans:bean id="computePartitionHandler"
class="org.springframework.batch.integration.partition.MessageChannelPartitionHandler"
p:stepName="computeStep" p:gridSize="${compute.grid.size}"
p:messagingOperations-ref="computeMessagingTemplate" />
<int:aggregator ref="computePartitionHandler"
send-partial-result-on-expiry="true" send-timeout="${compute.step.timeout}"
input-channel="computeInboundStagingChannel" />
<amqp:inbound-gateway concurrent-consumers="${compute.consumer.concurrency}"
request-channel="computeInboundChannel"
reply-channel="computeOutboundStagingChannel" queue-names="computeQueue"
connection-factory="rabbitConnectionFactory"
mapped-request-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS"
mapped-reply-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS" />
<int:channel id="computeInboundChannel" />
<int:service-activator ref="stepExecutionRequestHandler"
input-channel="computeInboundChannel" output-channel="computeOutboundStagingChannel" />
<int:channel id="computeOutboundStagingChannel" />
<beans:bean id="computePartitioner"
class="org.springframework.batch.core.partition.support.MultiResourcePartitioner"
p:resources="file:${spring.tmp.batch.dir}/#{jobParameters[batch_id]}/shares_rics/shares_rics_*.txt"
scope="step" />
<beans:bean id="computeFileItemReader"
class="org.springframework.batch.item.file.FlatFileItemReader"
p:resource="#{stepExecutionContext[fileName]}" p:lineMapper-ref="stLineMapper"
scope="step" />
<beans:bean id="computeItemWriter"
class="com.st.batch.foundation.writers.ComputeItemWriter"
p:symfony-ref="symfonyStepScoped" p:timeout="${compute.item.timeout}"
p:batchId="#{jobParameters[batch_id]}" scope="step" />
<step id="computeStep">
<tasklet transaction-manager="transactionManager">
<chunk reader="computeFileItemReader" writer="computeItemWriter"
commit-interval="${compute.commit.interval}" />
</tasklet>
</step>
<flow id="computeFlow">
<step id="computeStep.master">
<partition partitioner="computePartitioner"
handler="computePartitionHandler" />
</step>
</flow>
<job id="computeJob" restartable="true">
<flow id="computeJob.computeFlow" parent="computeFlow" />
</job>
compute.grid.size = 112
compute.consumer.concurrency = 10
Input files are splited to 112 equal parts = compute.grid.size = total number of partitions
Number of servers = 4.
<bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
<property name="jobRepository" ref="jobRepository" />
<property name="taskExecutor" ref="jobLauncherTaskExecutor" />
</bean>
<task:executor id="jobLauncherTaskExecutor" pool-size="6" rejection-policy="ABORT" />
compute.grid.size=112
compute.consumer.concurrency=10
输入文件被分成112个相等的部分=compute.grid.size=分区总数
服务器数量=4。
有两个问题,
i) 即使我已将并发设置为10,但运行的最大线程数为8
(ii)
有些进程运行得比较慢,而有些进程运行得比较快,所以我希望确保步骤执行是公平分布的,即,如果执行速度较快的服务器完成了它们的执行,队列中其他剩余的执行应该转到它们那里。它不应该以时尚的方式分发给每个人
我知道在rabbitmq中,有预取计数设置和ack模式来分配数据。对于spring集成,预取计数默认为1,确认模式默认为自动。但是仍然有一些服务器继续运行更多的分区,即使其他服务器已经运行了很长时间。理想情况下,服务器不应处于空闲状态
更新:
我现在观察到的另一件事是,对于一些使用split并行运行的步骤(不是使用远程分区分发的),也并行运行max 8。看起来有点像线程池限制问题,但正如您所看到的,taskExecutor将池大小设置为50
spring batch/spring batch admin中是否有限制并发运行步骤数量的内容
第二次更新:
而且,如果有8个或更多线程在并行处理项目中运行,spring batch admin不会加载。它只是挂着。如果我降低并发性,spring批处理管理将加载。我甚至在一台服务器上设置了并发4,在另一台服务器上设置了并发8,spring batch admin没有加载它,我使用了运行8个线程的服务器的URL,但它在运行4个线程的服务器上工作
Spring batch admin manager具有以下jobLauncher配置:
<task:executor id="taskExecutor" pool-size="50" />
<rabbit:template id="computeAmqpTemplate"
connection-factory="rabbitConnectionFactory" routing-key="computeQueue"
reply-timeout="${compute.partition.timeout}">
</rabbit:template>
<int:channel id="computeOutboundChannel">
<int:dispatcher task-executor="taskExecutor" />
</int:channel>
<int:channel id="computeInboundStagingChannel" />
<amqp:outbound-gateway request-channel="computeOutboundChannel"
reply-channel="computeInboundStagingChannel" amqp-template="computeAmqpTemplate"
mapped-request-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS"
mapped-reply-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS" />
<beans:bean id="computeMessagingTemplate"
class="org.springframework.integration.core.MessagingTemplate"
p:defaultChannel-ref="computeOutboundChannel"
p:receiveTimeout="${compute.partition.timeout}" />
<beans:bean id="computePartitionHandler"
class="org.springframework.batch.integration.partition.MessageChannelPartitionHandler"
p:stepName="computeStep" p:gridSize="${compute.grid.size}"
p:messagingOperations-ref="computeMessagingTemplate" />
<int:aggregator ref="computePartitionHandler"
send-partial-result-on-expiry="true" send-timeout="${compute.step.timeout}"
input-channel="computeInboundStagingChannel" />
<amqp:inbound-gateway concurrent-consumers="${compute.consumer.concurrency}"
request-channel="computeInboundChannel"
reply-channel="computeOutboundStagingChannel" queue-names="computeQueue"
connection-factory="rabbitConnectionFactory"
mapped-request-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS"
mapped-reply-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS" />
<int:channel id="computeInboundChannel" />
<int:service-activator ref="stepExecutionRequestHandler"
input-channel="computeInboundChannel" output-channel="computeOutboundStagingChannel" />
<int:channel id="computeOutboundStagingChannel" />
<beans:bean id="computePartitioner"
class="org.springframework.batch.core.partition.support.MultiResourcePartitioner"
p:resources="file:${spring.tmp.batch.dir}/#{jobParameters[batch_id]}/shares_rics/shares_rics_*.txt"
scope="step" />
<beans:bean id="computeFileItemReader"
class="org.springframework.batch.item.file.FlatFileItemReader"
p:resource="#{stepExecutionContext[fileName]}" p:lineMapper-ref="stLineMapper"
scope="step" />
<beans:bean id="computeItemWriter"
class="com.st.batch.foundation.writers.ComputeItemWriter"
p:symfony-ref="symfonyStepScoped" p:timeout="${compute.item.timeout}"
p:batchId="#{jobParameters[batch_id]}" scope="step" />
<step id="computeStep">
<tasklet transaction-manager="transactionManager">
<chunk reader="computeFileItemReader" writer="computeItemWriter"
commit-interval="${compute.commit.interval}" />
</tasklet>
</step>
<flow id="computeFlow">
<step id="computeStep.master">
<partition partitioner="computePartitioner"
handler="computePartitionHandler" />
</step>
</flow>
<job id="computeJob" restartable="true">
<flow id="computeJob.computeFlow" parent="computeFlow" />
</job>
compute.grid.size = 112
compute.consumer.concurrency = 10
Input files are splited to 112 equal parts = compute.grid.size = total number of partitions
Number of servers = 4.
<bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
<property name="jobRepository" ref="jobRepository" />
<property name="taskExecutor" ref="jobLauncherTaskExecutor" />
</bean>
<task:executor id="jobLauncherTaskExecutor" pool-size="6" rejection-policy="ABORT" />
那里的游泳池大小是6,这与上述问题有关吗
或者tomcat 7中是否存在将运行的线程数限制为8的情况?感到困惑-您说“我已将并发设置为10”,但随后显示compute.consumer.concurrency=8
。因此,它正在按配置工作。如果属性设置为10,则不可能只有8个使用者线程
从Rabbit的角度来看,所有的消费者都是平等的——如果在一个慢盒子上有10个消费者,在一个快盒子上有10个消费者,而您只有10个分区,那么所有10个分区都有可能在慢盒子上结束
RabbitMQ不跨服务器分发工作,它只跨消费者分发工作
通过减少并发性,您可能会获得更好的分发。您还应该在较慢的框中设置较低的并发性。您是否使用数据库作为JobRepository 在执行过程中,批处理框架会持续执行步骤执行,而到JobRepository数据库的连接数可能会干扰并行步骤执行
8的并发性使我认为您可能正在使用
BasicDataSource
?如果是这样,请切换到类似于DriverManagerDataSource
的内容,并参阅 对不起,有问题。我的配置中的值实际上是10。是的,我确实将并发性设置得更低。但理想情况下,如果较慢服务器上的用户正忙着,则其他服务器上的用户应接收消息。看起来,较慢服务器上的用户会不断收到消息,即使较快服务器上的用户空闲地坐着,更新问题中有更多信息,spring批处理/spring批处理管理问题如果并发使用者为10,则将有10个线程。时期SI/SB中没有将其限制为8的内容。正如我所说,Rabbit不知道下一个消费者是在繁忙还是空闲的服务器上。听起来你有太多的消费者来满足你的需求。如果一些批作业需要比其他分区更多的分区,考虑使用不同的配置。理想情况下,它应该运行10,但它不运行。我又补充了一个观察。嗨,维沙,我也有同样的问题。这个问题解决了吗?如果是这样的话,我可以知道你的解决方案吗。。。。。