Multithreading 使用Spring批处理文件项读取器的多线程_Multithreading_Spring_Spring Batch

Multithreading 使用Spring批处理文件项读取器的多线程

multithreading spring spring-batch

Multithreading 使用Spring批处理文件项读取器的多线程,multithreading,spring,spring-batch,Multithreading,Spring,Spring Batch,在Spring批处理中，我试图读取一个CSV文件，并希望将每一行分配给一个单独的线程并对其进行处理。我曾试图通过使用TaskExecutor来实现它，但所有线程都在一次拾取同一行。我还尝试使用Partioner实现这个概念，同样的事情也发生了。请参阅下面我的配置Xml 步骤说明 <step id="Step2"> <tasklet task-executor="taskExecutor"> <chunk reader=

在Spring批处理中，我试图读取一个CSV文件，并希望将每一行分配给一个单独的线程并对其进行处理。我曾试图通过使用TaskExecutor来实现它，但所有线程都在一次拾取同一行。我还尝试使用Partioner实现这个概念，同样的事情也发生了。请参阅下面我的配置Xml

步骤说明

    <step id="Step2">
        <tasklet task-executor="taskExecutor">
            <chunk reader="reader" processor="processor" writer="writer" commit-interval="1" skip-limit="1">
            </chunk>
        </tasklet> 
    </step>

              <bean id="reader" class="org.springframework.batch.item.file.FlatFileItemReader">
<property name="resource" value="file:cvs/user.csv" />

<property name="lineMapper">
    <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
      <!-- split it -->
      <property name="lineTokenizer">
            <bean
          class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
            <property name="names" value="userid,customerId,ssoId,flag1,flag2" />
        </bean>
      </property>
      <property name="fieldSetMapper">   

          <!-- map to an object -->
          <bean
            class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
            <property name="prototypeBeanName" value="user" />
          </bean>           
      </property>

      </bean>
  </property>

       </bean>

      <bean id="taskExecutor" class="org.springframework.core.task.SimpleAsyncTaskExecutor">
 <property name="concurrencyLimit" value="4"/>

我试过使用不同类型的任务执行器，但它们的行为方式都相同。如何将每一行分配给单独的线程？

FlatFileItemReader不是线程安全的。在您的示例中，您可以尝试将CSV文件拆分为较小的CSV文件，然后使用一个来处理其中的每个文件。这可以通过两个步骤完成，一个用于分割原始文件（如10个较小的文件），另一个用于处理分割的文件。这样，您就不会有任何问题，因为每个文件都将由一个线程处理

例如：

<batch:job id="csvsplitandprocess">
     <batch:step id="step1" next="step2master">
    <batch:tasklet>
        <batch:chunk reader="largecsvreader" writer="csvwriter" commit-interval="500">
        </batch:chunk>
    </batch:tasklet>
    </batch:step>
    <batch:step id="step2master">
    <partition step="step2" partitioner="partitioner">
        <handler grid-size="10" task-executor="taskExecutor"/>
    </partition>
</batch:step>
</batch:job>

<batch:step id="step2">
    <batch:tasklet>
        <batch:chunk reader="smallcsvreader" writer="writer" commit-interval="100">
        </batch:chunk>
    </batch:tasklet>
</batch:step>


<bean id="taskExecutor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
            <property name="corePoolSize" value="10" />
            <property name="maxPoolSize" value="10" />
    </bean>

<bean id="partitioner" 
class="org.springframework.batch.core.partition.support.MultiResourcePartitioner">
<property name="resources" value="file:cvs/extracted/*.csv" />
</bean>

替代分区的方法可能是自定义线程安全读取器，它将为每一行创建一个线程，但分区可能是您的最佳选择

您的问题是您的读取器不在范围步骤中

这意味着：所有线程共享相同的输入流（资源文件）

要为每个线程处理一行，您需要：

确保所有线程从头到尾读取文件文件结尾（每个线程都应该打开流并关闭它，以便每个执行上下文）

分区器必须为每个分区注入开始和结束位置执行上下文

您的阅读器必须读取具有此位置的文件

我编写了一些代码，这是输出：

com.test.partitioner.RangePartitioner类的代码：
public Map<String, ExecutionContext> partition() {

    Map < String, ExecutionContext > result = new HashMap < String, ExecutionContext >();

    int range = 1;
    int fromId = 1;
    int toId = range;

    for (int i = 1; i <= gridSize; i++) {
        ExecutionContext value = new ExecutionContext();

        log.debug("\nStarting : Thread" + i);
        log.debug("fromId : " + fromId);
        log.debug("toId : " + toId);

        value.putInt("fromId", fromId);
        value.putInt("toId", toId);

        // give each thread a name, thread 1,2,3
        value.putString("name", "Thread" + i);

        result.put("partition" + i, value);

        fromId = toId + 1;
        toId += range;

    }

    return result;
}

publicmap分区（）{
Mapresult=newhashmap（）；
int范围=1；
int-fromId=1；
int toId=范围；
对于（inti=1；我查看输出控制台
起点：Thread1
fromId:1
toId:1
起点：Thread2
fromId:2
toId:2
起点：Thread3
fromId:3
toId:3
起点：Thread4
fromId:4
toId:4
起点：Thread5
fromId:5
toId:5
起点：Thread6
fromId:6
toId:6
起点：Thread7
fromId:7
toId:7
起点：Thread8
fromId:8
toId:8
起点：Thread9
fromId:9
toId:9
起点：Thread10
fromId:10
toId:10
请看下面的配置：
http://www.springframework.org/schema/batch/spring-batch-2.2.xsd
“>

com.test.model.movement


TODO:在另一个上更改我的读卡器，该读卡器使用位置（开始和结束位置），就像java中的Scanner类一样
希望有此帮助。
您可以将输入文件拆分为多个文件，使用Partitionner并使用线程加载小文件，但出现错误时，必须在数据库清理后重新启动所有作业
<batch:job id="transformJob">
<batch:step id="deleteDir" next="cleanDB">
    <batch:tasklet ref="fileDeletingTasklet" />
</batch:step>
<batch:step id="cleanDB" next="split">
    <batch:tasklet ref="countThreadTasklet" />
</batch:step>
<batch:step id="split" next="partitionerMasterImporter">
    <batch:tasklet>
        <batch:chunk reader="largeCSVReader" writer="smallCSVWriter" commit-interval="#{jobExecutionContext['chunk.count']}" />
    </batch:tasklet>
</batch:step>
<batch:step id="partitionerMasterImporter" next="partitionerMasterExporter">
    <partition step="importChunked" partitioner="filePartitioner">
        <handler grid-size="10" task-executor="taskExecutor" />
    </partition>
</batch:step>




满满的
希望这有帮助。
您可以参考这一点。是的……我意识到我必须选择这两种选择中的任何一种……但哪一种性能更好？当然是分区，这是因为自定义读取器仍将逐行处理。另一方面，许多较小的csv文件将同时处理（分区步骤）。请记住，除了用于优化性能的缩放技术（如使用提交间隔、跳过和重试策略）之外，还有许多其他因素，通常每种情况都有自己的瓶颈。希望能有所帮助！太好了，在远程分区中使用多线程FlatFileItemReader安全吗？我猜，如图所示，您必须非常谨慎在读卡器的资源所在的地方加油（对于奴隶来说是本地的）。我认为这会起作用，如果不行，一个问题可以解决这个问题：）（我没有尝试你提到的读卡器，但从快速查看来看非常方便）如果我们使用MultiResourcePartitioner，读卡器的配置应该如何？我们可以将FlatFileItemReader与资源一起使用吗#{stepExecutionContext[fileName]}？或者我们需要使用MultiResourceItemReader？有趣的解决方案。我已编辑以修复格式并添加缺少的方法签名，但我可能弄错了。请检查并添加正确的方法名和连接选项，以初始化gridSize变量，好吗？
<batch:job id="transformJob">
<batch:step id="deleteDir" next="cleanDB">
    <batch:tasklet ref="fileDeletingTasklet" />
</batch:step>
<batch:step id="cleanDB" next="split">
    <batch:tasklet ref="countThreadTasklet" />
</batch:step>
<batch:step id="split" next="partitionerMasterImporter">
    <batch:tasklet>
        <batch:chunk reader="largeCSVReader" writer="smallCSVWriter" commit-interval="#{jobExecutionContext['chunk.count']}" />
    </batch:tasklet>
</batch:step>
<batch:step id="partitionerMasterImporter" next="partitionerMasterExporter">
    <partition step="importChunked" partitioner="filePartitioner">
        <handler grid-size="10" task-executor="taskExecutor" />
    </partition>
</batch:step>