Apache spark 火花并联

Apache spark 火花并联,apache-spark,dataframe,Apache Spark,Dataframe,这是我的驱动程序(伪代码): df10=spark.read(f10) //交叉连接每对并报告计数 cdf1=df1.交叉连接(df2) 打印cdf1.count ... cdf5=df9.交叉连接(df10) 打印cdf5.count 当我执行spark提交并转到tracker UI时,我看到每个作业都是按顺序执行的。我本来希望每个负载并行发生,每个交叉连接并行发生 我的错误在哪里?您不能并行运行不同的操作。您应该使用未来任务并行执行两个不同的操作。例如: import java.util

这是我的驱动程序(伪代码):

df10=spark.read(f10)
//交叉连接每对并报告计数
cdf1=df1.交叉连接(df2) 打印cdf1.count ... cdf5=df9.交叉连接(df10) 打印cdf5.count
当我执行spark提交并转到tracker UI时,我看到每个作业都是按顺序执行的。我本来希望每个负载并行发生,每个交叉连接并行发生


我的错误在哪里?

您不能并行运行不同的操作。您应该使用未来任务并行执行两个不同的操作。例如:

import java.util.concurrent.Executors
val executorService = Executors.newFixedThreadPool(8)

import java.util.concurrent.Callable
val future1 = executorService.submit(new Callable[Long]() {
  @throws[Exception]
  override def call: Long = {
    df1.crossJoin(df2)
    df1.count
  }
})

val future2 = executorService.submit(new Callable[Long]() {
  @throws[Exception]
  override def call: Long = {
    df1.crossJoin(df3)
    df1.count
  }
})
println(future1.get())
println(future2.get())

不能并行运行不同的操作。您应该使用未来任务并行执行两个不同的操作。例如:

import java.util.concurrent.Executors
val executorService = Executors.newFixedThreadPool(8)

import java.util.concurrent.Callable
val future1 = executorService.submit(new Callable[Long]() {
  @throws[Exception]
  override def call: Long = {
    df1.crossJoin(df2)
    df1.count
  }
})

val future2 = executorService.submit(new Callable[Long]() {
  @throws[Exception]
  override def call: Long = {
    df1.crossJoin(df3)
    df1.count
  }
})
println(future1.get())
println(future2.get())
import java.util.concurrent.Executors
val executorService = Executors.newFixedThreadPool(8)

import java.util.concurrent.Callable
val future1 = executorService.submit(new Callable[Long]() {
  @throws[Exception]
  override def call: Long = {
    df1.crossJoin(df2)
    df1.count
  }
})

val future2 = executorService.submit(new Callable[Long]() {
  @throws[Exception]
  override def call: Long = {
    df1.crossJoin(df3)
    df1.count
  }
})
println(future1.get())
println(future2.get())