使用scala期货触发scala GroupByCount操作
我有一个50列的数据框架,我想在并行模式下对10列进行groupBycount使用scala期货触发scala GroupByCount操作,scala,apache-spark,apache-spark-sql,scala-collections,concurrent.futures,Scala,Apache Spark,Apache Spark Sql,Scala Collections,Concurrent.futures,我有一个50列的数据框架,我想在并行模式下对10列进行groupBycount val parallelism = Columns.length val executor = Executors.newFixedThreadPool(parallelism) val ec: ExecutionContext = ExecutionContext.fromExecutor(executor) val tasks: Seq[Strin
val parallelism = Columns.length
val executor = Executors.newFixedThreadPool(parallelism)
val ec: ExecutionContext = ExecutionContext.fromExecutor(executor)
val tasks: Seq[String] = groupByOneCountColumns
val results = tasks.map(query => {
Future{
//spark stuff here
val groupByCount: Array[ResponseOnGroupByCount] = srcDF
.groupBy(query)
.count()
.map(x => ResponseOnGroupByCount(query.toString, x.getString(0), x.getLong(1)))(encoder)
.collect()
result += Json.toJson(groupByCount)
}(ec)
})
val allDone = Future.sequence(results)
//wait for results
Await.result(allDone, scala.concurrent.duration.Duration.Inf)
executor.shutdown //otherwise jvm will probably not exit
投掷
Task not serializable
org.apache.spark.SparkException: Task not serializable
Caused by: java.io.NotSerializableException: scala.concurrent.impl.ExecutionContextImpl
我尝试过用PAR运算来并行化,这是另一个问题。我想这个问题会对我们有所帮助。
提前谢谢