Java 分割数组元素,并使用多个线程并行地分块处理
我有两个数组,一个大约有2000条记录,另一个只有6条记录(包含访问令牌)。我想把这个大数组分成100个块,从另一个数组中分配一个访问令牌来处理这100条记录,并继续这样做,直到处理完所有2000条记录。最后一个访问令牌映射到100条记录后,下一组100条记录应该再次映射到第一个令牌(我实现了一个循环迭代器,以继续从令牌列表中获取令牌)。我确实尝试通过executor服务实现它,创建了一个线程池(bigarray.length/100)。但看起来我的多线程逻辑有问题,因为我能够处理所有ID并打印它,但在使用spring jpa系统保存到数据库时,系统崩溃,挂起并发出内存错误:Java 分割数组元素,并使用多个线程并行地分块处理,java,multithreading,spring-boot,parallel-processing,executorservice,Java,Multithreading,Spring Boot,Parallel Processing,Executorservice,我有两个数组,一个大约有2000条记录,另一个只有6条记录(包含访问令牌)。我想把这个大数组分成100个块,从另一个数组中分配一个访问令牌来处理这100条记录,并继续这样做,直到处理完所有2000条记录。最后一个访问令牌映射到100条记录后,下一组100条记录应该再次映射到第一个令牌(我实现了一个循环迭代器,以继续从令牌列表中获取令牌)。我确实尝试通过executor服务实现它,创建了一个线程池(bigarray.length/100)。但看起来我的多线程逻辑有问题,因为我能够处理所有ID并打印
Out of Memory error
Java heap space
HikariPool-1 - Thread starvation or clock leap detected (housekeeper delta=52s882ms437µs947ns).
2020-06-07 13:02:04.195 WARN 8214 --- [ool-1-thread-18] o.h.engine.jdbc.spi.SqlExceptionHelper : SQL Error: 0, SQLState: null
2020-06-07 13:02:04.196 WARN 8214 --- [ool-1-thread-10] o.h.engine.jdbc.spi.SqlExceptionHelper : SQL Error: 0, SQLState: null
==========================================================================
private void processIds(MyService service, long[] ids, List<Tokens> tokens) {
int threadsCount = (int)ids.length / 100;
ExecutorService executorService = Executors.newFixedThreadPool(threadsCount);
RoundRobinUtil<Tokens> tokensIterator = new RoundRobinUtil<Tokens>();
tokensIterator.setInputList(tokens);
int k = 0;
int j = 0;
while(k <= ids.length){
long[] newIds = new long[100];
int iterationLength = (ids.length - k) < 100 ? (ids.length - k) : 100;
for(int i = 0; i<iterationLength; i++, j++){
newIds[i] = ids[j]; //fetch 100 elements from big array and create a new array //of 100 elements
}
executorService.execute(new MyThread(newIds, service, repo, tokensIterator.iterator().next())); // assigning each 100 elements of the big array to a token //for processing in an independent thread
k = k + iterationLength;
}
executorService.shutdown();
}
@Data
@NoArgsConstructor
@AllArgsConstructor
Class MyThread extends Thread {
private long[] ids;
private Service service;
private Repository repo;
private Token token;
@Override
public void run() {
//process all the 100 ids of array with a token
UserDetails entity = new UserDetails;
ResponseList<User> details = service.fetchDetails(ids);
for(User u : details) {
entity.setName(u.getName);
repo.save(entity);
}
//save details of 100 ids to database
}
}
private void processid(MyService服务、长[]id、列表令牌){
int threadscont=(int)ids.length/100;
ExecutorService ExecutorService=Executors.newFixedThreadPool(ThreadScont);
RoundRobinUtil tokensIterator=新的RoundRobinUtil();
tokensIterator.setInputList(令牌);
int k=0;
int j=0;
而(k1)等待executorService关闭被认为是一种良好的做法,因为它可能在后台运行,否则:请参见waitTermination
2) 我不会让像I、k和j这样的变量名通过我的代码审查;-)
3) 请使用System.arraycopy(array,0,n,0,n)而不是for循环。它要快得多
4) 我会用更合理的启发式替换“threadCount”变量。如果传递的ID列表很长,可能会产生不想要的结果
我发现这在我们的系统中是最有效的:
Runtime runtime = Runtime.getRuntime();
ExecutorService executor = Executors.newFixedThreadPool(runtime.availableProcessors());
1) 等待executorService关闭被认为是一种良好的做法,否则它可能会在后台运行:请参见waitTermination
2) 我不会让像I、k和j这样的变量名通过我的代码审查;-)
3) 请使用System.arraycopy(array,0,n,0,n)而不是for循环。它要快得多
4) 我会用更合理的启发式替换“threadCount”变量。如果传递的ID列表很长,可能会产生不想要的结果
我发现这在我们的系统中是最有效的:
Runtime runtime = Runtime.getRuntime();
ExecutorService executor = Executors.newFixedThreadPool(runtime.availableProcessors());
我尝试优化代码的几件事,现在它获取结果的速度非常快,而且系统也不会陷入困境:
在100个元素的列表中转换100个元素的数组,并在hashmap中分配每个列表。ArrayList的性能始终优于数组
idList=Arrays.stream(ids.getIDs()).boxed().collect(Collectors.toList());listMap.put(listMap.size()+1,idList);
更新了processIds方法,并借助并行流API在其中添加了一些并行处理:
userListMap.entrySet().parallelStream().forEach(entry -> {
log.info("now inside map : key "+entry.getKey()+" -- value size :"+entry.getValue().size());
List<List<Long>> partition = Lists.partition(entry.getValue(), 100);
partition.stream().parallel().forEach(list -> {
log.info("now inside list of size:"+ list.size());
executorService.submit(new MyThread(list.stream().mapToLong(l -> l).toArray(), service, repo,
tokens.iterator().next()));
});
});
log.info("now shutting down executor service");
executorService.shutdown();
log.info("*****waiting for task to be completed*****");
System.out.println("*****waiting for task to be completed*****");
try {
executorService.awaitTermination(15, TimeUnit.MINUTES);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
userListMap.entrySet().parallelStream().forEach(条目->{
log.info(“现在位于map:key“+entry.getKey()+”--value size:“+entry.getValue().size()”);
列表分区=Lists.partition(entry.getValue(),100);
partition.stream().parallel().forEach(列表->{
log.info(“现在在大小列表中:+list.size());
executorService.submit(新的MyThread(list.stream().mapToLong(l->l).toArray()),service,repo,
iterator().next());
});
});
log.info(“正在关闭executor服务”);
executorService.shutdown();
log.info(“*******等待任务完成*******”;
System.out.println(“*******等待任务完成*******”;
试一试{
执行人服务。等待终止(15,时间单位。分钟);
}捕捉(中断异常e){
//TODO自动生成的捕捉块
e、 printStackTrace();
}
还更新了MyThread实现,并使用了saveAll而不是save inside运行方法:
repo.saveAll(实体);
========================================================================
private void processIds(MyService service, long[] ids, List<Tokens> tokens) {
int threadsCount = (int)ids.length / 100;
ExecutorService executorService = Executors.newFixedThreadPool(threadsCount);
RoundRobinUtil<Tokens> tokensIterator = new RoundRobinUtil<Tokens>();
tokensIterator.setInputList(tokens);
int k = 0;
int j = 0;
while(k <= ids.length){
long[] newIds = new long[100];
int iterationLength = (ids.length - k) < 100 ? (ids.length - k) : 100;
for(int i = 0; i<iterationLength; i++, j++){
newIds[i] = ids[j]; //fetch 100 elements from big array and create a new array //of 100 elements
}
executorService.execute(new MyThread(newIds, service, repo, tokensIterator.iterator().next())); // assigning each 100 elements of the big array to a token //for processing in an independent thread
k = k + iterationLength;
}
executorService.shutdown();
}
@Data
@NoArgsConstructor
@AllArgsConstructor
Class MyThread extends Thread {
private long[] ids;
private Service service;
private Repository repo;
private Token token;
@Override
public void run() {
//process all the 100 ids of array with a token
UserDetails entity = new UserDetails;
ResponseList<User> details = service.fetchDetails(ids);
for(User u : details) {
entity.setName(u.getName);
repo.save(entity);
}
//save details of 100 ids to database
}
}
因此,使用arraylist代替数组、并行处理hashmaps和arraylist以及使用saveAll批量保存所有实体是一些有助于优化的技巧
谢谢!我尝试优化代码的几件事,现在它获取结果的速度非常快,而且系统也不会卡住:
在100个元素的列表中转换100个元素的数组,并在hashmap中分配每个列表。ArrayList的性能始终优于数组
idList=Arrays.stream(ids.getIDs()).boxed().collect(Collectors.toList());listMap.put(listMap.size()+1,idList);
更新了processIds方法,并借助并行流API在其中添加了一些并行处理:
userListMap.entrySet().parallelStream().forEach(entry -> {
log.info("now inside map : key "+entry.getKey()+" -- value size :"+entry.getValue().size());
List<List<Long>> partition = Lists.partition(entry.getValue(), 100);
partition.stream().parallel().forEach(list -> {
log.info("now inside list of size:"+ list.size());
executorService.submit(new MyThread(list.stream().mapToLong(l -> l).toArray(), service, repo,
tokens.iterator().next()));
});
});
log.info("now shutting down executor service");
executorService.shutdown();
log.info("*****waiting for task to be completed*****");
System.out.println("*****waiting for task to be completed*****");
try {
executorService.awaitTermination(15, TimeUnit.MINUTES);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
userListMap.entrySet().parallelStream().forEach(条目->{
log.info(“现在位于map:key“+entry.getKey()+”--value size:“+entry.getValue().size()”);
列表分区=Lists.partition(entry.getValue(),100);
partition.stream().parallel().forEach(列表->{
log.info(“现在在大小列表中:+list.size());
executorService.submit(新的MyThread(list.stream().mapToLong(l->l).toArray()),service,repo,
iterator().next());
});
});
log.info(“正在关闭executor服务”);
executorService.shutdown();
log.info(“*******等待任务完成*******”;
System.out.println(“*******等待任务完成*******”;
试一试{
执行人服务。等待终止(15,时间单位。分钟);
}捕捉(中断异常e){
//TODO自动生成的捕捉块
e、 printStackTrace();
}
还更新了MyThread实现,并使用了saveAll而不是save inside运行方法:
repo.saveAll(实体);
=======================================