Asynchronous 为什么具有CompletableFuture的多线程代码比单线程代码慢？_Asynchronous_Java 8_Completable Future

Asynchronous 为什么具有CompletableFuture的多线程代码比单线程代码慢？

asynchronous java-8

Asynchronous 为什么具有CompletableFuture的多线程代码比单线程代码慢？,asynchronous,java-8,completable-future,Asynchronous,Java 8,Completable Future,我正在努力提高我的项目中当前代码的性能，该项目运行在单个线程中。代码执行如下操作： 1.获取10000000个对象的第一个列表。 2.获取10000000个对象的第二个列表。 3.将这两个（经过一些更改）合并到第三个列表中 Instant s = Instant.now(); List<Integer> l1 = getFirstList(); List<Integer> l2 = getSecondList(); List<Integ

我正在努力提高我的项目中当前代码的性能，该项目运行在单个线程中。代码执行如下操作： 1.获取10000000个对象的第一个列表。 2.获取10000000个对象的第二个列表。 3.将这两个（经过一些更改）合并到第三个列表中

   Instant s = Instant.now();
    List<Integer> l1 = getFirstList();
    List<Integer> l2 = getSecondList();
    List<Integer> l3 = new ArrayList<>();
    l3.addAll(l1);
    l3.addAll(l2);
    Instant e = Instant.now();
    System.out.println("Execution time: " + Duration.between(s, e).toMillis());

Instant s=Instant.now（）；
列表l1=getFirstList（）；
List l2=getSecondList（）；
列表l3=新的ArrayList（）；
l3.addAll（l1）；
l3.addAll（l2）；
瞬间e=瞬间。现在（）；
System.out.println（“执行时间：+Duration.between（s，e.toMillis（））；

下面是获取和组合列表的示例方法

    private static List<Integer> getFirstList() {
    System.out.println("First list is being created by: "+ Thread.currentThread().getName());
    List<Integer> l = new ArrayList<>();
    for (int i = 0; i < 10000000; i++) {
        l.add(i);
    }
    return l;
}

private static List<Integer> getSecondList() {

    System.out.println("Second list is being created by: "+ Thread.currentThread().getName());
    List<Integer> l = new ArrayList<>();
    for (int i = 10000000; i < 20000000; i++) {
        l.add(i);
    }
    return l;
}
private static List<Integer> combine(List<Integer> l1, List<Integer> l2) {

    System.out.println("Third list is being created by: "+ Thread.currentThread().getName());
   ArrayList<Integer> l3 = new ArrayList<>();
   l3.addAll(l1);
   l3.addAll(l2);
    return l3;
}

私有静态列表getFirstList（）{
System.out.println（“第一个列表是由以下人员创建的：”+Thread.currentThread（）.getName（））；
列表l=新的ArrayList（）；
对于（int i=0；i<10000000；i++）{
l、 加（i）；
}
返回l；
}
私有静态列表getSecondList（）{
System.out.println（“第二个列表是由以下人员创建的：”+Thread.currentThread（）.getName（））；
列表l=新的ArrayList（）；
对于（int i=10000000；i<20000000；i++）{
l、 加（i）；
}
返回l；
}
专用静态列表组合（列表l1、列表l2）{
System.out.println（“正在创建第三个列表：”+Thread.currentThread（）.getName（））；
ArrayList l3=新的ArrayList（）；
l3.addAll（l1）；
l3.addAll（l2）；
返回l3；
}

我正试图按如下方式重新编写上述代码：

    ExecutorService executor = Executors.newFixedThreadPool(10);
    Instant start = Instant.now();
    CompletableFuture<List<Integer>> cf1 = CompletableFuture.supplyAsync(() -> getFirstList(), executor);
    CompletableFuture<List<Integer>> cf2 = CompletableFuture.supplyAsync(() -> getSecondList(), executor);

    CompletableFuture<Void> cf3 = cf1.thenAcceptBothAsync(cf2, (l1, l2) -> combine(l1, l2), executor);
    try {
        cf3.get();
    } catch (InterruptedException e) {
        e.printStackTrace();
    } catch (ExecutionException e) {
        e.printStackTrace();
    }
    Instant end = Instant.now();
    System.out.println("Execution time: " + Duration.between(start, end).toMillis());

    executor.shutdown();

ExecutorService executor=Executors.newFixedThreadPool（10）；
即时开始=即时。现在（）；
CompletableFuture cf1=CompletableFuture.supplyAsync（（）->getFirstList（），executor）；
CompletableFuture cf2=CompletableFuture.supplyAsync（（）->getSecondList（），executor）；
CompletableFuture cf3=cf1。然后AcceptBothasync（cf2，（l1，l2）->combine（l1，l2），executor；
试一试{
cf3.get（）；
}捕捉（中断异常e）{
e、 printStackTrace（）；
}捕获（执行例外）{
e、 printStackTrace（）；
}
瞬间结束=瞬间。现在（）；
System.out.println（“执行时间：+Duration.between（start，end）.toMillis（））；
executor.shutdown（）；

单线程代码的执行时间为4-5秒，而多线程代码的执行时间为6秒以上。我做错了什么吗？

在单线程变体中，

l3.addAll（l1）；l3.addAll（l2）

从处理器缓存中获取

l1

和

l2

的元素（它们是在执行

getFirstList

和

getSecondList

时放在那里的）

在并行变体中，方法

combine（）

运行在另一个处理器内核上，缓存为空，并从主内存获取所有元素，这要慢得多。

您第一次执行这些方法，因此它们以解释模式启动。为了加速它们的第一次执行，优化器必须在它们运行时替换它们（称为堆栈替换），这并不总是提供与重新输入优化结果时相同的性能。同时这样做似乎更糟糕，至少对Java8来说是这样，因为我对Java11得到了完全不同的结果

因此，第一步是插入显式调用，例如

getFirstList（）；getSecondList（），以查看它在第一次未被调用时的执行情况
另一个方面是垃圾收集。有些JVM从一个小的初始堆开始，每次扩展堆时都会执行一次完整的GC，这对所有线程都有影响
因此，第二步是从-Xms1G
（或者更好的是，-Xms2G
）开始，从要创建的对象数量的合理堆大小开始
但是请注意，将中间结果列表添加到最终结果列表的第三步（在任何一种情况下都是按顺序进行的）对性能有重大影响
因此，第三步是用两个变体的l3=newarraylist（l1.size（）+l2.size（））
替换最终列表的构造，以确保列表具有适当的初始容量
这些步骤的组合导致在Java8下，顺序执行不到1秒，多线程执行不到半秒
对于Java11，它有一个更好的起点，只需要大约一秒钟的开箱即用，这些改进带来了不太明显的加速。这段代码的内存消耗似乎也要高得多