Scala 为什么Spark独立报告“;执行器丢失故障(执行器驱动器丢失)";用cogroup?
我正在使用独立模式下运行Spark的Scala 为什么Spark独立报告“;执行器丢失故障(执行器驱动器丢失)";用cogroup?,scala,apache-spark,Scala,Apache Spark,我正在使用独立模式下运行Spark的cogroup功能(对于两个数据集,一个9 GB,另一个110 KB),如下所示: 15/10/06 14:01:17 WARN HeartbeatReceiver: Removing executor driver with no recent heartbeats: 451457 ms exceeds timeout 120000 ms 15/10/06 14:01:17 ERROR TaskSchedulerImpl: Lost executor dri
cogroup
功能(对于两个数据集,一个9 GB,另一个110 KB),如下所示:
15/10/06 14:01:17 WARN HeartbeatReceiver: Removing executor driver with no recent heartbeats: 451457 ms exceeds timeout 120000 ms
15/10/06 14:01:17 ERROR TaskSchedulerImpl: Lost executor driver on localhost: Executor heartbeat timed out after 451457 ms
15/10/06 14:01:17 INFO TaskSetManager: Re-queueing tasks for driver from TaskSet 2.0
15/10/06 14:01:17 WARN TaskSetManager: Lost task 109.0 in stage 2.0 (TID 20111, localhost): ExecutorLostFailure (executor driver lost)
15/10/06 14:01:17 ERROR TaskSetManager: Task 109 in stage 2.0 failed 1 times; aborting job
15/10/06 14:01:17 INFO DAGScheduler: Resubmitted ShuffleMapTask(2, 91), so marking it as still running
15/10/06 14:01:17 WARN TaskSetManager: Lost task 34.0 in stage 2.0 (TID 20036, localhost): ExecutorLostFailure (executor driver lost)
15/10/06 14:01:17 INFO DAGScheduler: Resubmitted ShuffleMapTask(2, 118), so marking it as still running
15/10/06 14:01:17 INFO DAGScheduler: Resubmitted ShuffleMapTask(2, 100), so marking it as still running
15/10/06 14:01:17 INFO DAGScheduler: Resubmitted ShuffleMapTask(2, 76), so marking it as still running
我有128 GB的ram和24个内核。我的配置是:
set("spark.executor.memory","64g")
set("spark.driver.memory","64g")
IntelliJ虚拟机选项:-Xmx128G
正如您从代码中看到的,我已将数据划分为1000个部分。我还分别尝试了5000和10000次,因为在我的情况下,countByKey
非常昂贵
从其他一些StackOverflow帖子中,我看到了spark.default.parallelism
选项。我应该如何调整配置?我是否需要在IntelliJ虚拟机选项中添加更多内容?我应该使用spark.default.parallelism
val emp = sc.textFile("\\text1.txt",1000).map{line => val s = line.split("\t"); (s(3),s(1))}
val emp_new = sc.textFile("\\text2.txt",1000).map{line => val s = line.split("\t"); (s(3),s(1))}
val cog = emp.cogroup(emp_new)
val skk = cog.flatMap {
case (key: String, (l1: Iterable[String], l2: Iterable[String])) =>
for { e1 <- l1.toSeq; e2 <- l2.toSeq } yield ((e1, e2), 1)
}
val com = skk.countByKey()
由于您的第二个RDD非常小(110k),您是否考虑过收集和广播它,而不是将RDD组合在一起(我知道这有点离题,只是说…)?是的,我确实尝试过几种方法来连接两个RDD,但每次reduceByKey部分都需要太多的洗牌。我如何确定
spark.akka.frameSize
的近似数字?当设置为1000时,我仍然得到了与我在帖子中提到的相同的错误。问题在于驱动程序不是执行器(因此不确定是否是洗牌的根本原因)-注意ExecutorLostFailure(executor-driver-lost)
。你能展示独立主机和驱动程序web UI的屏幕截图吗?您不必担心spark.default.parallelism
,因为您通过分区数显式指定了它。@mlee_jordan,异常是否已解决?我在这里也面临同样的问题[需要帮助吗
15/10/06 14:01:17 INFO TaskSchedulerImpl: Cancelling stage 2
15/10/06 14:01:17 INFO DAGScheduler: ShuffleMapStage 2 (countByKey at ngram.scala:39) failed in 1020,915 s
15/10/06 14:01:17 INFO DAGScheduler: Job 0 failed: countByKey at ngram.scala:39, took 3025,563964 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 109 in stage 2.0 failed 1 times, most recent failure: Lost task 109.0 in stage 2.0 (TID 20111, localhost): ExecutorLostFailure (executor driver lost)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)