Apache spark 火花并联
这是我的驱动程序(伪代码):Apache spark 火花并联,apache-spark,dataframe,Apache Spark,Dataframe,这是我的驱动程序(伪代码): df10=spark.read(f10) //交叉连接每对并报告计数 cdf1=df1.交叉连接(df2) 打印cdf1.count ... cdf5=df9.交叉连接(df10) 打印cdf5.count 当我执行spark提交并转到tracker UI时,我看到每个作业都是按顺序执行的。我本来希望每个负载并行发生,每个交叉连接并行发生 我的错误在哪里?您不能并行运行不同的操作。您应该使用未来任务并行执行两个不同的操作。例如: import java.util
df10=spark.read(f10)
//交叉连接每对并报告计数
cdf1=df1.交叉连接(df2)
打印cdf1.count
...
cdf5=df9.交叉连接(df10)
打印cdf5.count
当我执行spark提交并转到tracker UI时,我看到每个作业都是按顺序执行的。我本来希望每个负载并行发生,每个交叉连接并行发生
我的错误在哪里?您不能并行运行不同的操作。您应该使用未来任务并行执行两个不同的操作。例如:
import java.util.concurrent.Executors
val executorService = Executors.newFixedThreadPool(8)
import java.util.concurrent.Callable
val future1 = executorService.submit(new Callable[Long]() {
@throws[Exception]
override def call: Long = {
df1.crossJoin(df2)
df1.count
}
})
val future2 = executorService.submit(new Callable[Long]() {
@throws[Exception]
override def call: Long = {
df1.crossJoin(df3)
df1.count
}
})
println(future1.get())
println(future2.get())
不能并行运行不同的操作。您应该使用未来任务并行执行两个不同的操作。例如:
import java.util.concurrent.Executors
val executorService = Executors.newFixedThreadPool(8)
import java.util.concurrent.Callable
val future1 = executorService.submit(new Callable[Long]() {
@throws[Exception]
override def call: Long = {
df1.crossJoin(df2)
df1.count
}
})
val future2 = executorService.submit(new Callable[Long]() {
@throws[Exception]
override def call: Long = {
df1.crossJoin(df3)
df1.count
}
})
println(future1.get())
println(future2.get())
import java.util.concurrent.Executors
val executorService = Executors.newFixedThreadPool(8)
import java.util.concurrent.Callable
val future1 = executorService.submit(new Callable[Long]() {
@throws[Exception]
override def call: Long = {
df1.crossJoin(df2)
df1.count
}
})
val future2 = executorService.submit(new Callable[Long]() {
@throws[Exception]
override def call: Long = {
df1.crossJoin(df3)
df1.count
}
})
println(future1.get())
println(future2.get())