Java 分割数组元素，并使用多个线程并行地分块处理_Java_Multithreading_Spring Boot_Parallel Processing_Executorservice

Java 分割数组元素，并使用多个线程并行地分块处理

java multithreading spring-boot parallel-processing

Java 分割数组元素，并使用多个线程并行地分块处理,java,multithreading,spring-boot,parallel-processing,executorservice,Java,Multithreading,Spring Boot,Parallel Processing,Executorservice,我有两个数组，一个大约有2000条记录，另一个只有6条记录（包含访问令牌）。我想把这个大数组分成100个块，从另一个数组中分配一个访问令牌来处理这100条记录，并继续这样做，直到处理完所有2000条记录。最后一个访问令牌映射到100条记录后，下一组100条记录应该再次映射到第一个令牌（我实现了一个循环迭代器，以继续从令牌列表中获取令牌）。我确实尝试通过executor服务实现它，创建了一个线程池（bigarray.length/100）。但看起来我的多线程逻辑有问题，因为我能够处理所有ID并打印

我有两个数组，一个大约有2000条记录，另一个只有6条记录（包含访问令牌）。我想把这个大数组分成100个块，从另一个数组中分配一个访问令牌来处理这100条记录，并继续这样做，直到处理完所有2000条记录。最后一个访问令牌映射到100条记录后，下一组100条记录应该再次映射到第一个令牌（我实现了一个循环迭代器，以继续从令牌列表中获取令牌）。我确实尝试通过executor服务实现它，创建了一个线程池（bigarray.length/100）。但看起来我的多线程逻辑有问题，因为我能够处理所有ID并打印它，但在使用spring jpa系统保存到数据库时，系统崩溃，挂起并发出内存错误：

Out of Memory error
Java heap space
HikariPool-1 - Thread starvation or clock leap detected (housekeeper delta=52s882ms437µs947ns).
2020-06-07 13:02:04.195  WARN 8214 --- [ool-1-thread-18] o.h.engine.jdbc.spi.SqlExceptionHelper   : SQL Error: 0, SQLState: null
2020-06-07 13:02:04.196  WARN 8214 --- [ool-1-thread-10] o.h.engine.jdbc.spi.SqlExceptionHelper   : SQL Error: 0, SQLState: null

==========================================================================

private void processIds(MyService service, long[] ids, List<Tokens> tokens) {

        int threadsCount = (int)ids.length / 100;
        ExecutorService executorService = Executors.newFixedThreadPool(threadsCount);
        RoundRobinUtil<Tokens> tokensIterator = new RoundRobinUtil<Tokens>();
        tokensIterator.setInputList(tokens);

        int k = 0;
        int j = 0;
        while(k <= ids.length){
            long[] newIds = new long[100];


            int iterationLength = (ids.length - k) < 100 ? (ids.length - k) : 100;
            for(int i = 0; i<iterationLength; i++, j++){
                newIds[i] = ids[j];  //fetch 100 elements from big array and create a new array //of 100 elements
            }

            executorService.execute(new MyThread(newIds, service, repo, tokensIterator.iterator().next()));   // assigning each 100 elements of the big array to a token //for processing in an independent thread 

            k = k + iterationLength;
       }
    executorService.shutdown();
}

@Data
@NoArgsConstructor
@AllArgsConstructor
Class MyThread extends Thread {
      private long[] ids;
      private Service service;
      private Repository repo;
      private Token token;

      @Override
      public void run() {
        //process all the 100 ids of array with a token
         UserDetails entity = new UserDetails;
         ResponseList<User> details = service.fetchDetails(ids);
         for(User u : details) {
             entity.setName(u.getName);
             repo.save(entity);
         } 
        //save details of 100 ids to database 
      }

}

private void processid（MyService服务、长[]id、列表令牌）{
int threadscont=（int）ids.length/100；
ExecutorService ExecutorService=Executors.newFixedThreadPool（ThreadScont）；
RoundRobinUtil tokensIterator=新的RoundRobinUtil（）；
tokensIterator.setInputList（令牌）；
int k=0；
int j=0；
而（k1）等待executorService关闭被认为是一种良好的做法，因为它可能在后台运行，否则：请参见waitTermination

2） 我不会让像I、k和j这样的变量名通过我的代码审查；-）
3） 请使用System.arraycopy（array，0，n，0，n）而不是for循环。它要快得多
4） 我会用更合理的启发式替换“threadCount”变量。如果传递的ID列表很长，可能会产生不想要的结果
我发现这在我们的系统中是最有效的：
Runtime runtime = Runtime.getRuntime();
ExecutorService executor = Executors.newFixedThreadPool(runtime.availableProcessors());

1） 等待executorService关闭被认为是一种良好的做法，否则它可能会在后台运行：请参见waitTermination

2） 我不会让像I、k和j这样的变量名通过我的代码审查；-）
3） 请使用System.arraycopy（array，0，n，0，n）而不是for循环。它要快得多
4） 我会用更合理的启发式替换“threadCount”变量。如果传递的ID列表很长，可能会产生不想要的结果
我发现这在我们的系统中是最有效的：
Runtime runtime = Runtime.getRuntime();
ExecutorService executor = Executors.newFixedThreadPool(runtime.availableProcessors());

我尝试优化代码的几件事，现在它获取结果的速度非常快，而且系统也不会陷入困境：
在100个元素的列表中转换100个元素的数组，并在hashmap中分配每个列表。ArrayList的性能始终优于数组
idList=Arrays.stream（ids.getIDs（））.boxed（）.collect（Collectors.toList（））；listMap.put（listMap.size（）+1，idList）；

更新了processIds方法，并借助并行流API在其中添加了一些并行处理：
 userListMap.entrySet().parallelStream().forEach(entry -> {
     log.info("now inside map : key "+entry.getKey()+" -- value size :"+entry.getValue().size());
     List<List<Long>> partition = Lists.partition(entry.getValue(), 100);
     partition.stream().parallel().forEach(list -> {
         log.info("now inside list of size:"+ list.size());
         executorService.submit(new MyThread(list.stream().mapToLong(l -> l).toArray(), service, repo,
                 tokens.iterator().next()));
     });
 });


 log.info("now shutting down  executor service");
 executorService.shutdown();

 log.info("*****waiting for task to be completed*****");
 System.out.println("*****waiting for task to be completed*****");
 try {
     executorService.awaitTermination(15, TimeUnit.MINUTES);
 } catch (InterruptedException e) {
     // TODO Auto-generated catch block
     e.printStackTrace();
 }

userListMap.entrySet（）.parallelStream（）.forEach（条目->{
log.info（“现在位于map:key“+entry.getKey（）+”--value size:“+entry.getValue（）.size（）”）；
列表分区=Lists.partition（entry.getValue（），100）；
partition.stream（）.parallel（）.forEach（列表->{
log.info（“现在在大小列表中：+list.size（））；
executorService.submit（新的MyThread（list.stream（）.mapToLong（l->l）.toArray（）），service，repo，
iterator（）.next（））；
});
});
log.info（“正在关闭executor服务”）；
executorService.shutdown（）；
log.info（“*******等待任务完成*******”；
System.out.println（“*******等待任务完成*******”；
试一试{
执行人服务。等待终止（15，时间单位。分钟）；
}捕捉（中断异常e）{
//TODO自动生成的捕捉块
e、 printStackTrace（）；
}


还更新了MyThread实现，并使用了saveAll而不是save inside运行方法：
repo.saveAll（实体）；


========================================================================
private void processIds(MyService service, long[] ids, List<Tokens> tokens) {

        int threadsCount = (int)ids.length / 100;
        ExecutorService executorService = Executors.newFixedThreadPool(threadsCount);
        RoundRobinUtil<Tokens> tokensIterator = new RoundRobinUtil<Tokens>();
        tokensIterator.setInputList(tokens);

        int k = 0;
        int j = 0;
        while(k <= ids.length){
            long[] newIds = new long[100];


            int iterationLength = (ids.length - k) < 100 ? (ids.length - k) : 100;
            for(int i = 0; i<iterationLength; i++, j++){
                newIds[i] = ids[j];  //fetch 100 elements from big array and create a new array //of 100 elements
            }

            executorService.execute(new MyThread(newIds, service, repo, tokensIterator.iterator().next()));   // assigning each 100 elements of the big array to a token //for processing in an independent thread 

            k = k + iterationLength;
       }
    executorService.shutdown();
}

@Data
@NoArgsConstructor
@AllArgsConstructor
Class MyThread extends Thread {
      private long[] ids;
      private Service service;
      private Repository repo;
      private Token token;

      @Override
      public void run() {
        //process all the 100 ids of array with a token
         UserDetails entity = new UserDetails;
         ResponseList<User> details = service.fetchDetails(ids);
         for(User u : details) {
             entity.setName(u.getName);
             repo.save(entity);
         } 
        //save details of 100 ids to database 
      }

}

因此，使用arraylist代替数组、并行处理hashmaps和arraylist以及使用saveAll批量保存所有实体是一些有助于优化的技巧
谢谢！
我尝试优化代码的几件事，现在它获取结果的速度非常快，而且系统也不会卡住：
在100个元素的列表中转换100个元素的数组，并在hashmap中分配每个列表。ArrayList的性能始终优于数组
idList=Arrays.stream（ids.getIDs（））.boxed（）.collect（Collectors.toList（））；listMap.put（listMap.size（）+1，idList）；

更新了processIds方法，并借助并行流API在其中添加了一些并行处理：
 userListMap.entrySet().parallelStream().forEach(entry -> {
     log.info("now inside map : key "+entry.getKey()+" -- value size :"+entry.getValue().size());
     List<List<Long>> partition = Lists.partition(entry.getValue(), 100);
     partition.stream().parallel().forEach(list -> {
         log.info("now inside list of size:"+ list.size());
         executorService.submit(new MyThread(list.stream().mapToLong(l -> l).toArray(), service, repo,
                 tokens.iterator().next()));
     });
 });


 log.info("now shutting down  executor service");
 executorService.shutdown();

 log.info("*****waiting for task to be completed*****");
 System.out.println("*****waiting for task to be completed*****");
 try {
     executorService.awaitTermination(15, TimeUnit.MINUTES);
 } catch (InterruptedException e) {
     // TODO Auto-generated catch block
     e.printStackTrace();
 }

userListMap.entrySet（）.parallelStream（）.forEach（条目->{
log.info（“现在位于map:key“+entry.getKey（）+”--value size:“+entry.getValue（）.size（）”）；
列表分区=Lists.partition（entry.getValue（），100）；
partition.stream（）.parallel（）.forEach（列表->{
log.info（“现在在大小列表中：+list.size（））；
executorService.submit（新的MyThread（list.stream（）.mapToLong（l->l）.toArray（）），service，repo，
iterator（）.next（））；
});
});
log.info（“正在关闭executor服务”）；
executorService.shutdown（）；
log.info（“*******等待任务完成*******”；
System.out.println（“*******等待任务完成*******”；
试一试{
执行人服务。等待终止（15，时间单位。分钟）；
}捕捉（中断异常e）{
//TODO自动生成的捕捉块
e、 printStackTrace（）；
}


还更新了MyThread实现，并使用了saveAll而不是save inside运行方法：
repo.saveAll（实体）；


=======================================