Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 为什么Spark独立报告“;执行器丢失故障(执行器驱动器丢失)";用cogroup?_Scala_Apache Spark - Fatal编程技术网

Scala 为什么Spark独立报告“;执行器丢失故障(执行器驱动器丢失)";用cogroup?

Scala 为什么Spark独立报告“;执行器丢失故障(执行器驱动器丢失)";用cogroup?,scala,apache-spark,Scala,Apache Spark,我正在使用独立模式下运行Spark的cogroup功能(对于两个数据集,一个9 GB,另一个110 KB),如下所示: 15/10/06 14:01:17 WARN HeartbeatReceiver: Removing executor driver with no recent heartbeats: 451457 ms exceeds timeout 120000 ms 15/10/06 14:01:17 ERROR TaskSchedulerImpl: Lost executor dri

我正在使用独立模式下运行Spark的
cogroup
功能(对于两个数据集,一个9 GB,另一个110 KB),如下所示:

15/10/06 14:01:17 WARN HeartbeatReceiver: Removing executor driver with no recent heartbeats: 451457 ms exceeds timeout 120000 ms
15/10/06 14:01:17 ERROR TaskSchedulerImpl: Lost executor driver on localhost: Executor heartbeat timed out after 451457 ms
15/10/06 14:01:17 INFO TaskSetManager: Re-queueing tasks for driver from TaskSet 2.0
15/10/06 14:01:17 WARN TaskSetManager: Lost task 109.0 in stage 2.0 (TID 20111, localhost): ExecutorLostFailure (executor driver lost)
15/10/06 14:01:17 ERROR TaskSetManager: Task 109 in stage 2.0 failed 1 times; aborting job
15/10/06 14:01:17 INFO DAGScheduler: Resubmitted ShuffleMapTask(2, 91), so marking it as still running
15/10/06 14:01:17 WARN TaskSetManager: Lost task 34.0 in stage 2.0 (TID 20036, localhost): ExecutorLostFailure (executor driver lost)
15/10/06 14:01:17 INFO DAGScheduler: Resubmitted ShuffleMapTask(2, 118), so marking it as still running
15/10/06 14:01:17 INFO DAGScheduler: Resubmitted ShuffleMapTask(2, 100), so marking it as still running
15/10/06 14:01:17 INFO DAGScheduler: Resubmitted ShuffleMapTask(2, 76), so marking it as still running
我有128 GB的ram和24个内核。我的配置是:

set("spark.executor.memory","64g")
set("spark.driver.memory","64g")
IntelliJ虚拟机选项:
-Xmx128G

正如您从代码中看到的,我已将数据划分为1000个部分。我还分别尝试了5000和10000次,因为在我的情况下,
countByKey
非常昂贵

从其他一些StackOverflow帖子中,我看到了
spark.default.parallelism
选项。我应该如何调整配置?我是否需要在IntelliJ虚拟机选项中添加更多内容?我应该使用
spark.default.parallelism

val emp = sc.textFile("\\text1.txt",1000).map{line => val s = line.split("\t"); (s(3),s(1))}
val emp_new = sc.textFile("\\text2.txt",1000).map{line => val s = line.split("\t"); (s(3),s(1))}

val cog = emp.cogroup(emp_new)
val skk = cog.flatMap {
            case (key: String, (l1: Iterable[String], l2: Iterable[String])) =>
              for { e1 <- l1.toSeq; e2 <- l2.toSeq } yield ((e1, e2), 1)
          }
val com = skk.countByKey()


由于您的第二个RDD非常小(110k),您是否考虑过收集和广播它,而不是将RDD组合在一起(我知道这有点离题,只是说…)?是的,我确实尝试过几种方法来连接两个RDD,但每次reduceByKey部分都需要太多的洗牌。我如何确定
spark.akka.frameSize
的近似数字?当设置为1000时,我仍然得到了与我在帖子中提到的相同的错误。问题在于驱动程序不是执行器(因此不确定是否是洗牌的根本原因)-注意
ExecutorLostFailure(executor-driver-lost)
。你能展示独立主机和驱动程序web UI的屏幕截图吗?您不必担心
spark.default.parallelism
,因为您通过分区数显式指定了它。@mlee_jordan,异常是否已解决?我在这里也面临同样的问题[需要帮助吗
15/10/06 14:01:17 INFO TaskSchedulerImpl: Cancelling stage 2
15/10/06 14:01:17 INFO DAGScheduler: ShuffleMapStage 2 (countByKey at ngram.scala:39) failed in 1020,915 s
15/10/06 14:01:17 INFO DAGScheduler: Job 0 failed: countByKey at ngram.scala:39, took 3025,563964 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 109 in stage 2.0 failed 1 times, most recent failure: Lost task 109.0 in stage 2.0 (TID 20111, localhost): ExecutorLostFailure (executor driver lost)
Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)