Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/380.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java Spark数据流foreachDD函数中RDD的并发转换_Java_Apache Spark_Spark Streaming_Rdd_Dstream - Fatal编程技术网

Java Spark数据流foreachDD函数中RDD的并发转换

Java Spark数据流foreachDD函数中RDD的并发转换,java,apache-spark,spark-streaming,rdd,dstream,Java,Apache Spark,Spark Streaming,Rdd,Dstream,在下面的代码中,函数fn1和fn2似乎以顺序的方式应用于inRDD,正如我在Spark Web UI的Stages部分中看到的那样 DstreamRDD1.foreachRDD(new VoidFunction<JavaRDD<String>>() { public void call(JavaRDD<String> inRDD) { inRDD.foreach(fn1) inRDD.fo

在下面的代码中,函数fn1和fn2似乎以顺序的方式应用于inRDD,正如我在Spark Web UI的Stages部分中看到的那样

 DstreamRDD1.foreachRDD(new VoidFunction<JavaRDD<String>>()
 { 
     public void call(JavaRDD<String> inRDD)
        {
          inRDD.foreach(fn1)
          inRDD.foreach(fn2)
        }
 }

RDD
上的
foreach
DStream
上的
foreachRDD
都将按顺序运行,因为它们是输出转换,这意味着它们会导致图形的具体化。Spark中的任何一般惰性转换都不是这种情况,当执行图分为多个单独的阶段时,它可以并行运行

例如:

dStream: DStream[String] = ???
val first = dStream.filter(x => x.contains("h"))
val second = dStream.filter(x => !x.contains("h"))

first.print()
second.print()

当您有足够的集群资源并行运行底层阶段时,第一部分不需要按顺序执行。然后,调用
count
,这也是一种输出转换,将导致
print
语句一个接一个地被打印出来。

两者都是顺序调用,不是并行的。
dStream: DStream[String] = ???
val first = dStream.filter(x => x.contains("h"))
val second = dStream.filter(x => !x.contains("h"))

first.print()
second.print()