Java流并行化的可视化
通常不太清楚并行流是如何将输入分割成块的,以及块的连接顺序。是否有任何方法可以可视化任何流源的整个过程,以便更好地了解发生了什么?假设我创建了这样一个流:Java流并行化的可视化,java,parallel-processing,java-8,java-stream,Java,Parallel Processing,Java 8,Java Stream,通常不太清楚并行流是如何将输入分割成块的,以及块的连接顺序。是否有任何方法可以可视化任何流源的整个过程,以便更好地了解发生了什么?假设我创建了这样一个流: Stream<Integer> stream = IntStream.range(0, 100).boxed().parallel(); 这意味着整个输入范围[0..99]被拆分为[0..49]和[50..99]范围,而这两个范围又被进一步拆分。当然,这样的图应该反映流API的实际工作,因此,如果我对这样的流执行一些实际操作,则
Stream<Integer> stream = IntStream.range(0, 100).boxed().parallel();
这意味着整个输入范围
[0..99]
被拆分为[0..49]
和[50..99]
范围,而这两个范围又被进一步拆分。当然,这样的图应该反映流API的实际工作,因此,如果我对这样的流执行一些实际操作,则拆分应该以相同的方式执行。当前流API实现使用收集器合并器以与先前拆分完全相同的方式合并中间结果。此外,拆分策略取决于源池和公共池的并行度级别,但不取决于所使用的精确缩减操作(与reduce
、collect
、forEach
、count
等操作相同)。基于此,创建可视化收集器并不十分困难:
public static Collector<Object, ?, List<String>> parallelVisualize() {
class Range {
private String first, last;
private Range left, right;
void accept(Object obj) {
if (first == null)
first = obj.toString();
else
last = obj.toString();
}
Range combine(Range that) {
Range p = new Range();
p.first = first == null ? that.first : first;
p.last = Stream
.of(that.last, that.first, this.last, this.first)
.filter(Objects::nonNull).findFirst().orElse(null);
p.left = this;
p.right = that;
return p;
}
String pad(String s, int left, int len) {
if (len == s.length())
return s;
char[] result = new char[len];
Arrays.fill(result, ' ');
s.getChars(0, s.length(), result, left);
return new String(result);
}
public List<String> finish() {
String cur = toString();
if (left == null) {
return Collections.singletonList(cur);
}
List<String> l = left.finish();
List<String> r = right.finish();
int len1 = l.get(0).length();
int len2 = r.get(0).length();
int totalLen = len1 + len2 + 1;
int leftAdd = 0;
if (cur.length() < totalLen) {
cur = pad(cur, (totalLen - cur.length()) / 2, totalLen);
} else {
leftAdd = (cur.length() - totalLen) / 2;
totalLen = cur.length();
}
List<String> result = new ArrayList<>();
result.add(cur);
char[] dashes = new char[totalLen];
Arrays.fill(dashes, ' ');
Arrays.fill(dashes, len1 / 2 + leftAdd + 1, len1 + len2 / 2 + 1
+ leftAdd, '_');
int mid = totalLen / 2;
dashes[mid] = '/';
dashes[mid + 1] = '\\';
result.add(new String(dashes));
Arrays.fill(dashes, ' ');
dashes[len1 / 2 + leftAdd] = '|';
dashes[len1 + len2 / 2 + 1 + leftAdd] = '|';
result.add(new String(dashes));
int maxSize = Math.max(l.size(), r.size());
for (int i = 0; i < maxSize; i++) {
String lstr = l.size() > i ? l.get(i) : String.format("%"
+ len1 + "s", "");
String rstr = r.size() > i ? r.get(i) : String.format("%"
+ len2 + "s", "");
result.add(pad(lstr + " " + rstr, leftAdd, totalLen));
}
return result;
}
public String toString() {
if (first == null)
return "(empty)";
else if (last == null)
return "[" + first + "]";
return "[" + first + ".." + last + "]";
}
}
return Collector.of(Range::new, Range::accept, Range::combine,
Range::finish);
}
甚至分为16项任务:
[0..99]
___________________________________/\________________________________
| |
[0..49] [50..99]
_________________/\______________ _________________/\________________
| | | |
[0..24] [25..49] [50..74] [75..99]
________/\_____ ________/\_______ ________/\_______ ________/\_______
| | | | | | | |
[0..11] [12..24] [25..36] [37..49] [50..61] [62..74] [75..86] [87..99]
___/\_ ___/\___ ___/\___ ___/\___ ___/\___ ___/\___ ___/\___ ___/\___
| | | | | | | | | | | | | | | |
[0..5] [6..11] [12..17] [18..24] [25..30] [31..36] [37..42] [43..49] [50..55] [56..61] [62..67] [68..74] [75..80] [81..86] [87..92] [93..99]
两个流的拆分连接:
IntStream.range(0, 100)
.boxed().parallel().collect(parallelVisualize())
.forEach(System.out::println);
IntStream
.concat(IntStream.range(0, 10), IntStream.range(10, 100))
.boxed().parallel().collect(parallelVisualize())
.forEach(System.out::println);
Stream.concat(IntStream.range(0, 50).boxed().parallel(), IntStream.range(50, 100).boxed())
.collect(parallelVisualize())
.forEach(System.out::println);
Stream.of(0, 50)
.flatMap(start -> IntStream.range(start, start+50).boxed().parallel())
.parallel().collect(parallelVisualize())
.forEach(System.out::println);
StreamSupport
.stream(Spliterators.spliterator(IntStream.range(0, 7000)
.iterator(), 7000, Spliterator.ORDERED), true)
.collect(parallelVisualize()).forEach(System.out::println);
如您所见,first split取消连接流:
[0..99]
_______________________________________________________________________/\_____
| |
[0..9] [10..99]
__/\__ ___________________________________/\__________________________________
| | | |
[0..4] [5..9] [10..54] [55..99]
_________________/\________________ _________________/\________________
| | | |
[10..31] [32..54] [55..76] [77..99]
________/\_______ ________/\_______ ________/\_______ ________/\_______
| | | | | | | |
[10..20] [21..31] [32..42] [43..54] [55..65] [66..76] [77..87] [88..99]
___/\___ ___/\___ ___/\___ ___/\___ ___/\___ ___/\___ ___/\___ ___/\___
| | | | | | | | | | | | | | | |
[10..14] [15..20] [21..25] [26..31] [32..36] [37..42] [43..48] [49..54] [55..59] [60..65] [66..70] [71..76] [77..81] [82..87] [88..93] [94..99]
两个流连接的拆分,其中在连接之前执行了中间操作(boxed()):
IntStream.range(0, 100)
.boxed().parallel().collect(parallelVisualize())
.forEach(System.out::println);
IntStream
.concat(IntStream.range(0, 10), IntStream.range(10, 100))
.boxed().parallel().collect(parallelVisualize())
.forEach(System.out::println);
Stream.concat(IntStream.range(0, 50).boxed().parallel(), IntStream.range(50, 100).boxed())
.collect(parallelVisualize())
.forEach(System.out::println);
Stream.of(0, 50)
.flatMap(start -> IntStream.range(start, start+50).boxed().parallel())
.parallel().collect(parallelVisualize())
.forEach(System.out::println);
StreamSupport
.stream(Spliterators.spliterator(IntStream.range(0, 7000)
.iterator(), 7000, Spliterator.ORDERED), true)
.collect(parallelVisualize()).forEach(System.out::println);
如果其中一个输入流在连接之前未转换为并行模式,则它将拒绝拆分:
[0..99]
___/\_________________________________
| |
[0..49] [50..99]
_________________/\______________
| |
[0..24] [25..49]
________/\_____ ________/\_______
| | | |
[0..11] [12..24] [25..36] [37..49]
___/\_ ___/\___ ___/\___ ___/\___
| | | | | | | |
[0..5] [6..11] [12..17] [18..24] [25..30] [31..36] [37..42] [43..49]
平面映射的拆分:
IntStream.range(0, 100)
.boxed().parallel().collect(parallelVisualize())
.forEach(System.out::println);
IntStream
.concat(IntStream.range(0, 10), IntStream.range(10, 100))
.boxed().parallel().collect(parallelVisualize())
.forEach(System.out::println);
Stream.concat(IntStream.range(0, 50).boxed().parallel(), IntStream.range(50, 100).boxed())
.collect(parallelVisualize())
.forEach(System.out::println);
Stream.of(0, 50)
.flatMap(start -> IntStream.range(start, start+50).boxed().parallel())
.parallel().collect(parallelVisualize())
.forEach(System.out::println);
StreamSupport
.stream(Spliterators.spliterator(IntStream.range(0, 7000)
.iterator(), 7000, Spliterator.ORDERED), true)
.collect(parallelVisualize()).forEach(System.out::println);
平面贴图从不在嵌套流内并行:
[0..99]
____/\__
| |
[0..49] [50..99]
来自7000个元素的未知大小迭代器的流(请参阅上下文):
分裂真的很糟糕,大家都在等待最重要的部分[3072..6143]:
[0..6999]
_______________________/\___
| |
[0..1023] [1024..6999]
________________/\____
| |
[1024..3071] [3072..6999]
_________/\_____
| |
[3072..6143] [6144..6999]
___/\____
| |
[6144..6999] (empty)
已知大小的迭代器源:
IntStream.range(0, 100)
.boxed().parallel().collect(parallelVisualize())
.forEach(System.out::println);
IntStream
.concat(IntStream.range(0, 10), IntStream.range(10, 100))
.boxed().parallel().collect(parallelVisualize())
.forEach(System.out::println);
Stream.concat(IntStream.range(0, 50).boxed().parallel(), IntStream.range(50, 100).boxed())
.collect(parallelVisualize())
.forEach(System.out::println);
Stream.of(0, 50)
.flatMap(start -> IntStream.range(start, start+50).boxed().parallel())
.parallel().collect(parallelVisualize())
.forEach(System.out::println);
StreamSupport
.stream(Spliterators.spliterator(IntStream.range(0, 7000)
.iterator(), 7000, Spliterator.ORDERED), true)
.collect(parallelVisualize()).forEach(System.out::println);
提供大小可以更好地解锁进一步拆分:
[0..6999]
______________________________________________________________________________________________/\________
| |
[0..1023] [1024..6999]
_____/\__ ____________________________________________________________________/\________________________
| | | |
[0..511] [512..1023] [1024..3071] [3072..6999]
____________/\___________ ________________/\__________________________________________________
| | | |
[1024..2047] [2048..3071] [3072..6143] [6144..6999]
_____/\_____ _____/\_____ _________________________/\________________________ ___/\___________
| | | | | | | |
[1024..1535] [1536..2047] [2048..2559] [2560..3071] [3072..4607] [4608..6143] [6144..6999] (empty)
____________/\___________ ____________/\___________ _____/\_____
| | | | | |
[3072..3839] [3840..4607] [4608..5375] [5376..6143] [6144..6571] [6572..6999]
_____/\_____ _____/\_____ _____/\_____ _____/\_____
| | | | | | | |
[3072..3455] [3456..3839] [3840..4223] [4224..4607] [4608..4991] [4992..5375] [5376..5759] [5760..6143]
这种收集器的进一步改进可以生成图形图像(如svg)、跟踪处理每个节点的线程、显示每个组的元素数等等。如果您愿意,可以使用它。我想增加一个解决方案,以监控源端甚至中间操作的拆分(当前的流API实现施加了一些限制):
将打印
[0..99]
___________________________________/\________________________________
| |
[0..49] [50..99]
_________________/\______________ _________________/\________________
| | | |
[0..24] [25..49] [50..74] [75..99]
________/\_____ ________/\_______ ________/\_______ ________/\_______
| | | | | | | |
[0..11] [12..24] [25..36] [37..49] [50..61] [62..74] [75..86] [87..99]
___/\_ ___/\___ ___/\___ ___/\___ ___/\___ ___/\___ ___/\___ ___/\___
| | | | | | | | | | | | | | | |
[0..5] [6..11] [12..17] [18..24] [25..30] [31..36] [37..42] [43..49] [50..55] [56..61] [62..67] [68..74] [75..80] [81..86] [87..92] [93..99]
鉴于
try(Stream<String> s=proxy(IntStream.range(0, 100).parallel().filter(i -> i/20%2==0)
.mapToObj(ix->"\""+ix+'"'))) {
s.forEach(str->{});
}
正如我们在这里所看到的,我们正在监视.filter(…).mapToObj(…)
的结果,但是块显然是由源确定的,可能会根据过滤器的条件在下游生成空块
请注意,我们可以将源监控与Tagir的收集器监控结合起来:
try(IntStream s=proxy(IntStream.range(0, 100))) {
s.parallel().filter(i -> i/20%2==0)
.boxed().collect(parallelVisualize())
.forEach(System.out::println);
}
这将打印(请注意,collect
输出首先打印):
[0..99]
________________________________/\_______________________________
| |
[0..49] [50..99]
________________/\______________ _______________/\_______________
| | | |
[0..19] [40..49] [50..59] [80..99]
________/\_____ ________/\______ _______/\_______ ________/\_____
| | | | | | | |
[0..11][12..19](空)[40..49][50..59](空)[80..86][87..99