Java 分割数组元素,并使用多个线程并行地分块处理

Java 分割数组元素,并使用多个线程并行地分块处理,java,multithreading,spring-boot,parallel-processing,executorservice,Java,Multithreading,Spring Boot,Parallel Processing,Executorservice,我有两个数组,一个大约有2000条记录,另一个只有6条记录(包含访问令牌)。我想把这个大数组分成100个块,从另一个数组中分配一个访问令牌来处理这100条记录,并继续这样做,直到处理完所有2000条记录。最后一个访问令牌映射到100条记录后,下一组100条记录应该再次映射到第一个令牌(我实现了一个循环迭代器,以继续从令牌列表中获取令牌)。我确实尝试通过executor服务实现它,创建了一个线程池(bigarray.length/100)。但看起来我的多线程逻辑有问题,因为我能够处理所有ID并打印

我有两个数组,一个大约有2000条记录,另一个只有6条记录(包含访问令牌)。我想把这个大数组分成100个块,从另一个数组中分配一个访问令牌来处理这100条记录,并继续这样做,直到处理完所有2000条记录。最后一个访问令牌映射到100条记录后,下一组100条记录应该再次映射到第一个令牌(我实现了一个循环迭代器,以继续从令牌列表中获取令牌)。我确实尝试通过executor服务实现它,创建了一个线程池(bigarray.length/100)。但看起来我的多线程逻辑有问题,因为我能够处理所有ID并打印它,但在使用spring jpa系统保存到数据库时,系统崩溃,挂起并发出内存错误:

Out of Memory error
Java heap space
HikariPool-1 - Thread starvation or clock leap detected (housekeeper delta=52s882ms437µs947ns).
2020-06-07 13:02:04.195  WARN 8214 --- [ool-1-thread-18] o.h.engine.jdbc.spi.SqlExceptionHelper   : SQL Error: 0, SQLState: null
2020-06-07 13:02:04.196  WARN 8214 --- [ool-1-thread-10] o.h.engine.jdbc.spi.SqlExceptionHelper   : SQL Error: 0, SQLState: null
==========================================================================

private void processIds(MyService service, long[] ids, List<Tokens> tokens) {

        int threadsCount = (int)ids.length / 100;
        ExecutorService executorService = Executors.newFixedThreadPool(threadsCount);
        RoundRobinUtil<Tokens> tokensIterator = new RoundRobinUtil<Tokens>();
        tokensIterator.setInputList(tokens);

        int k = 0;
        int j = 0;
        while(k <= ids.length){
            long[] newIds = new long[100];


            int iterationLength = (ids.length - k) < 100 ? (ids.length - k) : 100;
            for(int i = 0; i<iterationLength; i++, j++){
                newIds[i] = ids[j];  //fetch 100 elements from big array and create a new array //of 100 elements
            }

            executorService.execute(new MyThread(newIds, service, repo, tokensIterator.iterator().next()));   // assigning each 100 elements of the big array to a token //for processing in an independent thread 

            k = k + iterationLength;
       }
    executorService.shutdown();
}

@Data
@NoArgsConstructor
@AllArgsConstructor
Class MyThread extends Thread {
      private long[] ids;
      private Service service;
      private Repository repo;
      private Token token;

      @Override
      public void run() {
        //process all the 100 ids of array with a token
         UserDetails entity = new UserDetails;
         ResponseList<User> details = service.fetchDetails(ids);
         for(User u : details) {
             entity.setName(u.getName);
             repo.save(entity);
         } 
        //save details of 100 ids to database 
      }

}
private void processid(MyService服务、长[]id、列表令牌){
int threadscont=(int)ids.length/100;
ExecutorService ExecutorService=Executors.newFixedThreadPool(ThreadScont);
RoundRobinUtil tokensIterator=新的RoundRobinUtil();
tokensIterator.setInputList(令牌);
int k=0;
int j=0;
而(k1)等待executorService关闭被认为是一种良好的做法,因为它可能在后台运行,否则:请参见
waitTermination

2) 我不会让像I、k和j这样的变量名通过我的代码审查;-)

3) 请使用System.arraycopy(array,0,n,0,n)而不是for循环。它要快得多

4) 我会用更合理的启发式替换“threadCount”变量。如果传递的ID列表很长,可能会产生不想要的结果

我发现这在我们的系统中是最有效的:

Runtime runtime = Runtime.getRuntime();
ExecutorService executor = Executors.newFixedThreadPool(runtime.availableProcessors());
1) 等待executorService关闭被认为是一种良好的做法,否则它可能会在后台运行:请参见
waitTermination

2) 我不会让像I、k和j这样的变量名通过我的代码审查;-)

3) 请使用System.arraycopy(array,0,n,0,n)而不是for循环。它要快得多

4) 我会用更合理的启发式替换“threadCount”变量。如果传递的ID列表很长,可能会产生不想要的结果

我发现这在我们的系统中是最有效的:

Runtime runtime = Runtime.getRuntime();
ExecutorService executor = Executors.newFixedThreadPool(runtime.availableProcessors());

我尝试优化代码的几件事,现在它获取结果的速度非常快,而且系统也不会陷入困境:

  • 在100个元素的列表中转换100个元素的数组,并在hashmap中分配每个列表。ArrayList的性能始终优于数组
  • idList=Arrays.stream(ids.getIDs()).boxed().collect(Collectors.toList());listMap.put(listMap.size()+1,idList);

  • 更新了processIds方法,并借助并行流API在其中添加了一些并行处理:

     userListMap.entrySet().parallelStream().forEach(entry -> {
         log.info("now inside map : key "+entry.getKey()+" -- value size :"+entry.getValue().size());
         List<List<Long>> partition = Lists.partition(entry.getValue(), 100);
         partition.stream().parallel().forEach(list -> {
             log.info("now inside list of size:"+ list.size());
             executorService.submit(new MyThread(list.stream().mapToLong(l -> l).toArray(), service, repo,
                     tokens.iterator().next()));
         });
     });
    
    
     log.info("now shutting down  executor service");
     executorService.shutdown();
    
     log.info("*****waiting for task to be completed*****");
     System.out.println("*****waiting for task to be completed*****");
     try {
         executorService.awaitTermination(15, TimeUnit.MINUTES);
     } catch (InterruptedException e) {
         // TODO Auto-generated catch block
         e.printStackTrace();
     }
    
    userListMap.entrySet().parallelStream().forEach(条目->{
    log.info(“现在位于map:key“+entry.getKey()+”--value size:“+entry.getValue().size()”);
    列表分区=Lists.partition(entry.getValue(),100);
    partition.stream().parallel().forEach(列表->{
    log.info(“现在在大小列表中:+list.size());
    executorService.submit(新的MyThread(list.stream().mapToLong(l->l).toArray()),service,repo,
    iterator().next());
    });
    });
    log.info(“正在关闭executor服务”);
    executorService.shutdown();
    log.info(“*******等待任务完成*******”;
    System.out.println(“*******等待任务完成*******”;
    试一试{
    执行人服务。等待终止(15,时间单位。分钟);
    }捕捉(中断异常e){
    //TODO自动生成的捕捉块
    e、 printStackTrace();
    }
    
  • 还更新了MyThread实现,并使用了saveAll而不是save inside运行方法:

    repo.saveAll(实体);

  • ========================================================================

    private void processIds(MyService service, long[] ids, List<Tokens> tokens) {
    
            int threadsCount = (int)ids.length / 100;
            ExecutorService executorService = Executors.newFixedThreadPool(threadsCount);
            RoundRobinUtil<Tokens> tokensIterator = new RoundRobinUtil<Tokens>();
            tokensIterator.setInputList(tokens);
    
            int k = 0;
            int j = 0;
            while(k <= ids.length){
                long[] newIds = new long[100];
    
    
                int iterationLength = (ids.length - k) < 100 ? (ids.length - k) : 100;
                for(int i = 0; i<iterationLength; i++, j++){
                    newIds[i] = ids[j];  //fetch 100 elements from big array and create a new array //of 100 elements
                }
    
                executorService.execute(new MyThread(newIds, service, repo, tokensIterator.iterator().next()));   // assigning each 100 elements of the big array to a token //for processing in an independent thread 
    
                k = k + iterationLength;
           }
        executorService.shutdown();
    }
    
    @Data
    @NoArgsConstructor
    @AllArgsConstructor
    Class MyThread extends Thread {
          private long[] ids;
          private Service service;
          private Repository repo;
          private Token token;
    
          @Override
          public void run() {
            //process all the 100 ids of array with a token
             UserDetails entity = new UserDetails;
             ResponseList<User> details = service.fetchDetails(ids);
             for(User u : details) {
                 entity.setName(u.getName);
                 repo.save(entity);
             } 
            //save details of 100 ids to database 
          }
    
    }
    
    因此,使用arraylist代替数组、并行处理hashmaps和arraylist以及使用saveAll批量保存所有实体是一些有助于优化的技巧


    谢谢!

    我尝试优化代码的几件事,现在它获取结果的速度非常快,而且系统也不会卡住:

  • 在100个元素的列表中转换100个元素的数组,并在hashmap中分配每个列表。ArrayList的性能始终优于数组
  • idList=Arrays.stream(ids.getIDs()).boxed().collect(Collectors.toList());listMap.put(listMap.size()+1,idList);

  • 更新了processIds方法,并借助并行流API在其中添加了一些并行处理:

     userListMap.entrySet().parallelStream().forEach(entry -> {
         log.info("now inside map : key "+entry.getKey()+" -- value size :"+entry.getValue().size());
         List<List<Long>> partition = Lists.partition(entry.getValue(), 100);
         partition.stream().parallel().forEach(list -> {
             log.info("now inside list of size:"+ list.size());
             executorService.submit(new MyThread(list.stream().mapToLong(l -> l).toArray(), service, repo,
                     tokens.iterator().next()));
         });
     });
    
    
     log.info("now shutting down  executor service");
     executorService.shutdown();
    
     log.info("*****waiting for task to be completed*****");
     System.out.println("*****waiting for task to be completed*****");
     try {
         executorService.awaitTermination(15, TimeUnit.MINUTES);
     } catch (InterruptedException e) {
         // TODO Auto-generated catch block
         e.printStackTrace();
     }
    
    userListMap.entrySet().parallelStream().forEach(条目->{
    log.info(“现在位于map:key“+entry.getKey()+”--value size:“+entry.getValue().size()”);
    列表分区=Lists.partition(entry.getValue(),100);
    partition.stream().parallel().forEach(列表->{
    log.info(“现在在大小列表中:+list.size());
    executorService.submit(新的MyThread(list.stream().mapToLong(l->l).toArray()),service,repo,
    iterator().next());
    });
    });
    log.info(“正在关闭executor服务”);
    executorService.shutdown();
    log.info(“*******等待任务完成*******”;
    System.out.println(“*******等待任务完成*******”;
    试一试{
    执行人服务。等待终止(15,时间单位。分钟);
    }捕捉(中断异常e){
    //TODO自动生成的捕捉块
    e、 printStackTrace();
    }
    
  • 还更新了MyThread实现,并使用了saveAll而不是save inside运行方法:

    repo.saveAll(实体);

  • =======================================