Java 并行流在不同的操作下工作正常吗？_Java_Java 8_Java Stream

Java 并行流在不同的操作下工作正常吗？

java java-8

Java 并行流在不同的操作下工作正常吗？,java,java-8,java-stream,Java,Java 8,Java Stream,我在读关于无国籍的书时，在以下方面遇到了这个问题：如果流操作的行为参数是有状态的。A. 有状态lambda（或其他实现适当功能接口）的结果取决于在执行流管道期间可能会发生更改现在，如果我有一个字符串列表（strListsay），然后尝试通过以下方式使用并行流从中删除重复的字符串： List<String> resultOne = strList.parallelStream().distinct().collect(Collectors.toList()); List r

我在读关于无国籍的书时，在以下方面遇到了这个问题：

如果流操作的行为参数是有状态的。A. 有状态lambda（或其他实现适当功能接口）的结果取决于在执行流管道期间可能会发生更改

现在，如果我有一个字符串列表（

strList

say），然后尝试通过以下方式使用并行流从中删除重复的字符串：

List<String> resultOne = strList.parallelStream().distinct().collect(Collectors.toList());

List resultOne=strList.parallelStream（）.distinct（）.collect（Collectors.toList（））；

或者，如果我们希望不区分大小写：

List<String> result2 = strList.parallelStream().map(String::toLowerCase)
                       .distinct().collect(Collectors.toList());

List result2=strList.parallelStream（）.map（字符串：：toLowerCase）
.distinct（）.collect（collector.toList（））；

这段代码会有任何问题吗？因为并行流会分割输入，并且在一个块中的distinct不一定意味着在整个输入中的distinct

编辑（以下答案的快速摘要）

distinct

是一种有状态操作，在有状态中间操作的情况下，并行流可能需要多次传递或大量缓冲开销。此外，如果元素的顺序不相关，则可以更有效地实现

distinct

。还根据：

对于有序流，不同元素的选择是稳定的（对于重复的元素，在遭遇战中首先出现的元素对于无序流，没有稳定性保证这是我们制造的

但在有序流并行运行的情况下，distinct可能不稳定——这意味着在重复的情况下，它将保留任意元素，而不一定是

distinct

中预期的第一个元素

从：

在内部，distinct（）操作保留一个包含以前见过的元素，但它埋在操作，我们无法从应用程序代码获取它

因此，在并行流的情况下，它可能会消耗整个流或使用CHM（类似于

ConcurrentHashMap.newKeySet（）

）。对于有序的，最有可能使用的是

LinkedHashSet

或类似的构造。

不会有问题（问题是错误的结果），但正如注释所述

在并行管道中保持distinct（）的稳定性相对昂贵

但是，如果性能值得关注，并且不是问题（即结果的元素顺序与它处理的集合不同），那么您需要遵循API的说明

删除BaseStream.unordered（）的排序约束可能会使中的distinct（）执行效率显著提高平行管道

我想为什么不为

distinct

public static void main(String[] args) {
        List<String> strList = Arrays.asList("cat", "nat", "hat", "tat", "heart", "fat", "bat", "lad", "crab", "snob");

        List<String> words = new Vector<>();


        int wordCount = 1_000_000; // no. of words in the list words
        int avgIter = 10; // iterations to run to find average running time

        //populate a list randomly with the strings in `strList`
        for (int i = 0; i < wordCount; i++) 
            words.add(strList.get((int) Math.round(Math.random() * (strList.size() - 1))));





        //find out average running times
        long starttime, pod = 0, pud = 0, sod = 0;
        for (int i = 0; i < avgIter; i++) {
            starttime = System.currentTimeMillis();
            List<String> parallelOrderedDistinct = words.parallelStream().distinct().collect(Collectors.toList());
            pod += System.currentTimeMillis() - starttime;

            starttime = System.currentTimeMillis();
            List<String> parallelUnorderedDistinct =
                    words.parallelStream().unordered().distinct().collect(Collectors.toList());
            pud += System.currentTimeMillis() - starttime;

            starttime = System.currentTimeMillis();
            List<String> sequentialOrderedDistinct = words.stream().distinct().collect(Collectors.toList());
            sod += System.currentTimeMillis() - starttime;
        }

        System.out.println("Parallel ordered time in ms: " + pod / avgIter);
        System.out.println("Parallel unordered time in ms: " + pud / avgIter);
        System.out.println("Sequential implicitly ordered time in ms: " + sod / avgIter);
    }

(二)

(三)

无序平行线的速度是两者的两倍

然后我把

wordCount

增加到

，结果如下

(一)

(二)

(三)

然后转到

10\u 000\u 000

(一)

(二)

(三)

大致指出（重点，矿山）的相关部分：

中间操作进一步分为无状态和有状态操作。无状态操作，如筛选和映射，处理新元素时，不保留以前看到的元素的任何状态元素——每个元素都可以独立于操作进行处理关于其他因素有状态操作，如distinct和sorted，处理时可能会合并以前看到的元素的状态新元素

有状态操作可能需要在执行之前处理整个输入产生结果。例如，无法从中产生任何结果对流进行排序，直到看到流的所有元素作为一个结果，在并行计算下，一些管道包含有状态中间操作可能需要多次传递数据，或者需要缓冲重要数据。仅包含无状态中间操作可以在一次过程中处理，无论是顺序的还是并行的，具有最小的数据缓冲

如果您进一步阅读（订购部分）：

流可能有也可能没有定义的遭遇顺序。是否流的遭遇顺序取决于源和目标中间业务某些流源（如列表或数组）本质上是有序的，而其他数组（如HashSet）不是。某些中间操作（如sorted（））可能会强制执行在其他无序流中遇到订单，其他人可能会以无序方式呈现有序流，例如BaseStream.unordered（）。此外，一些终端操作可能会忽略遭遇顺序，例如 forEach（）

对于并行流，有时可以放松排序约束实现更高效的执行某些聚合操作，例如筛选重复项（distinct（））或分组的缩减（Collectors.groupingBy（））可以更高效地实现，如果元素的顺序不相关。类似地，以下操作与遭遇顺序有内在联系，例如limit（），可能需要缓冲以确保正确排序，从而损害平行性在流具有遭遇顺序的情况下，但是用户并不特别关心这种遭遇顺序使用unordered（）对流进行反排序可以提高并行性某些有状态或终端操作的性能。然而，大多数河流管道，如上面的“区块重量总和”示例，即使在以下情况下仍能有效地并行化

Parallel ordered time in ms: 52
Parallel unordered time in ms: 81
Sequential implicitly ordered time in ms: 35

Parallel ordered time in ms: 48
Parallel unordered time in ms: 83
Sequential implicitly ordered time in ms: 34

Parallel ordered time in ms: 36
Parallel unordered time in ms: 70
Sequential implicitly ordered time in ms: 32

Parallel ordered time in ms: 93
Parallel unordered time in ms: 363
Sequential implicitly ordered time in ms: 123

Parallel ordered time in ms: 100
Parallel unordered time in ms: 363
Sequential implicitly ordered time in ms: 124

Parallel ordered time in ms: 89
Parallel unordered time in ms: 365
Sequential implicitly ordered time in ms: 118

Parallel ordered time in ms: 148
Parallel unordered time in ms: 725
Sequential implicitly ordered time in ms: 218

Parallel ordered time in ms: 150
Parallel unordered time in ms: 749
Sequential implicitly ordered time in ms: 224

Parallel ordered time in ms: 143
Parallel unordered time in ms: 743
Sequential implicitly ordered time in ms: 222

List<String> result2 = strList.parallelStream()
                              .unordered()
                              .map(String::toLowerCase)
                              .distinct()
                              .collect(Collectors.toList());