Apache spark 如何使用spark runner重新洗牌apache beam_Apache Spark_Apache Beam

Apache spark 如何使用spark runner重新洗牌apache beam

apache-spark

Apache spark 如何使用spark runner重新洗牌apache beam,apache-spark,apache-beam,Apache Spark,Apache Beam,我正在使用spark runner进行此模拟： PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create(); Pipeline p = Pipeline.create(options); p.apply(Create.of(1)) .apply(ParDo.of(new DoFn<Integer, Integer>() { @ProcessElement

我正在使用spark runner进行此模拟：

PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();

Pipeline p = Pipeline.create(options);
p.apply(Create.of(1))
 .apply(ParDo.of(new DoFn<Integer, Integer>() {
                    @ProcessElement
                    public void apply(@Element Integer element, OutputReceiver<Integer> outputReceiver) {
                        IntStream.range(0, 4_000_000).forEach(outputReceiver::output);

                    }
                }))
.apply(Reshuffle.viaRandomKey())
.apply(ParDo.of(new DoFn<Integer, Integer>() {
                    @ProcessElement
                    public void apply(@Element Integer element, OutputReceiver<Integer> outputReceiver) {
                        try {
                            // simulate a rpc call of 10ms
                            Thread.sleep(10);
                        } catch (InterruptedException e) {
                            e.printStackTrace();
                        }
                        outputReceiver.output(element);

                    }
                }));
PipelineResult result = p.run();
result.waitUntilFinish();

然后我运行8个线程

BR，Rafael。

看来，Beam on Spark的重组归根结底是在

我想知道在这种情况下

rdd.context（）.defaultParallelism（）

和

rdd.getNumPartitions（）

是否都是1。我已经立案调查了

同时，您可以使用GroupByKey获得所需的并行性，正如您所指出的那样。（如果没有整数，可以尝试使用元素的散列、Math.random（）甚至递增计数器作为键）。

谢谢robertwb，我没有意识到这一点。然后我将在spark中使用默认的并行选项，我将期待这个bug。我遇到的真正问题是TextIO，它在加载文件时在内部使用这种重新排列。

.apply(MapElements.into(kvs(integers(), integers())).via(e -> KV.of(e % 8, e)))
.apply(GroupByKey.create())
.apply(Values.create())
.apply(Flatten.iterables())