如何使用ForkJoinPool在java中使用多核？_Java_Parallel Processing_Fork Join_Forkjoinpool

如何使用ForkJoinPool在java中使用多核？

java parallel-processing

如何使用ForkJoinPool在java中使用多核？,java,parallel-processing,fork-join,forkjoinpool,Java,Parallel Processing,Fork Join,Forkjoinpool,所以我试图了解ForkJoinPool是如何工作的。我正试图通过对大约200万个元素的大型数组使用它来获得更好的性能，然后添加它们的倒数。我理解ForkJoinPool.commpnPool.invoketask；调用compute，如果任务不小，则将其分成两个任务，然后进行计算，然后将其合并。到目前为止，我们使用的是两个内核但是，如果我想在多个内核上执行此操作，如何做到这一点，并实现比通常的单线程运行高4倍的性能？以下是我的默认ForkJoinPool代码： @Override

所以我试图了解ForkJoinPool是如何工作的。我正试图通过对大约200万个元素的大型数组使用它来获得更好的性能，然后添加它们的倒数。我理解ForkJoinPool.commpnPool.invoketask；调用compute，如果任务不小，则将其分成两个任务，然后进行计算，然后将其合并。到目前为止，我们使用的是两个内核

但是，如果我想在多个内核上执行此操作，如何做到这一点，并实现比通常的单线程运行高4倍的性能？以下是我的默认ForkJoinPool代码：

@Override
        protected void compute() {
            // TODO
            if (endIndexExclusive - startIndexInclusive <= seq_count) {
                for (int i = startIndexInclusive; i < endIndexExclusive; i++)
                    value += 1 / input[i];
            } else {

                ReciprocalArraySumTask left = new ReciprocalArraySumTask(startIndexInclusive,
                        (endIndexExclusive + startIndexInclusive) / 2, input);
                ReciprocalArraySumTask right = new ReciprocalArraySumTask((endIndexExclusive + startIndexInclusive) / 2,
                        endIndexExclusive, input);
                left.fork();
                right.compute();
                left.join();
                value = left.value + right.value;
            }
        }
    }


protected static double parArraySum(final double[] input) {
        assert input.length % 2 == 0;

        double sum = 0;

        // Compute sum of reciprocals of array elements
        ReciprocalArraySumTask task = new ReciprocalArraySumTask(0, input.length, input);
        ForkJoinPool.commonPool().invoke(task);
        return task.getValue();
    }

//Here I am trying to achieve with 4 cores
protected static double parManyTaskArraySum(final double[] input,
                                                final int numTasks) {
        double sum = 0;
        System.out.println("Total tasks = " + numTasks);
        System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", String.valueOf(numTasks));
        // Compute sum of reciprocals of array elements
        int chunkSize = ReciprocalArraySum.getChunkSize(numTasks, input.length);
        System.out.println("Chunk size = " + chunkSize);
        ReciprocalArraySumTask task = new ReciprocalArraySumTask(0, input.length, input);
        ForkJoinPool pool = new ForkJoinPool();
//        pool.
        ForkJoinPool.commonPool().invoke(task);
        return task.getValue();
    }

您希望使用4个内核，但您正在提供一个只需要两个内核的作业。在下面的示例中，getChunkStartInclusive和getChunkEndExclusive方法给出了每个块的开始索引和结束索引的范围。我相信下面的代码可以解决您的问题，并为您提供一些实现想法

protected static double parManyTaskArraySum(final double[] input,
        final int numTasks) {
    double sum = 0;
    System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", String.valueOf(numTasks));
    List<ReciprocalArraySumTask> ts = new ArrayList<ReciprocalArraySumTask>(numTasks);

    int i;
    for (i = 0; i < numTasks - 1 ; i++) {
        ts.add(new ReciprocalArraySumTask(getChunkStartInclusive(i,numTasks,input.length),getChunkEndExclusive(i,numTasks,input.length),input));
        ts.get(i).fork();
    }
    ts.add( new ReciprocalArraySumTask(getChunkStartInclusive(i, numTasks, input.length), getChunkEndExclusive(i, numTasks, input.length), input));
    ts.get(i).compute();

    for (int j = 0; j < numTasks - 1; j++) {
        ts.get(j).join();
    }

    for (int j = 0; j < numTasks; j++) {
        sum += ts.get(j).getValue();
    }
    return sum;
}

这是我的方法：

Threshold是计算开始计算并停止堆栈递归调用时的限制，如果每个处理器使用两次或两次以上，效果会更好。当然，这是有限制的，因为我使用numTask*2

因此，即使在这个实现中，我的计算方法仍然是一样的？不，for循环可以在没有if-else的情况下停留在计算中，并且您可以将else部分带入parArraySum，因为您不需要将此部分用于parManyTaskArraySum。我尝试过，但我没有得到任何性能改进。我有2个内核，每个内核有2个逻辑处理器，因此内核总数为4。我是否遗漏了任何东西，或者在给定此配置的情况下它是预期的？您的计算机上有多少内核？此外，这实际上取决于您的计算机的性能，但我希望性能升级。此外，您还可以查看我的存储库，我为您公开了它。出于某种原因，我认为这可能不是正确的编码方法。您只是将它分散到4个处理器上，我也不确定它是否在您的代码中执行，因为您使用的是默认的ForkJoinPool，它只使用2个内核。此外，我认为它应该通过将任务划分为不同的任务，然后在这4个可用核心中的每一个上逐个汇集这些任务来工作。

 protected static double parManyTaskArraySum(final double[] input,
                                         final int numTasks) {
     int start;
     int end;

     int size = input.length;
     int threshold = size / (numTasks * 2);

     List<ReciprocalArraySumTask> actions = new ArrayList<>();

     for (int i = 0; i < numTasks; i++) {
         start = getChunkStartInclusive(i, numTasks, size);
         end = getChunkEndExclusive(i, numTasks, size);
         actions.add(new ReciprocalArraySumTask(start, end, input, threshold, I));
     }
     ForkJoinTask.invokeAll(actions);

     return actions.stream().map(ReciprocalArraySumTask::getValue).reduce(new Double(0), Double::sum);

  }