Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 为什么Spark会出现FetchFailed错误?_Scala_Apache Spark_Mesos_Apache Zeppelin - Fatal编程技术网

Scala 为什么Spark会出现FetchFailed错误?

Scala 为什么Spark会出现FetchFailed错误?,scala,apache-spark,mesos,apache-zeppelin,Scala,Apache Spark,Mesos,Apache Zeppelin,我在ApacheMesos上使用ApacheZeppelin,它有4个节点,总容量为210GB 我的Spark工作是在事务的小数据集和事件的大数据集之间进行关联。我希望根据时间和ID(事件时间和事务时间,ID和ID)将每个事务与最近的事件相匹配 我得到以下错误: FetchFailed(null, shuffleId=1, mapId=-1, reduceId=20, message=org.apache.spark.shuffle.MetadataFetchFailedException:

我在ApacheMesos上使用ApacheZeppelin,它有4个节点,总容量为210GB

我的Spark工作是在事务的小数据集和事件的大数据集之间进行关联。我希望根据时间和ID(事件时间和事务时间,ID和ID)将每个事务与最近的事件相匹配

我得到以下错误:

FetchFailed(null, shuffleId=1, mapId=-1, reduceId=20,
  message=org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 1
    at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:542)
    at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:538)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
    at org.apache.spark.MapOutputTracker$.org$apache$spark$MapOutputTracker$$convertMapStatuses(MapOutputTracker.scala:538)
    at org.apache.spark.MapOutputTracker.getMapSizesByExecutorId(MapOutputTracker.scala:155)
    at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:47)
    at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:98)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
    at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:140)
    at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:136)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
    at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:136)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
这是我的算法

val groupRDD = event
    .map { evt => ((evt.id, evt.date_time.toString.dropRight(8)), cdr) }
    .groupByKey(new HashPartitioner(128))
    .persist(StorageLevel.MEMORY_AND_DISK_SER)
val joinedRDD = groupRDD.rightOuterJoin {
    transactions.keyBy { transac => (transac.id, transac.dateTime.toString.dropRight(8)) }}
val result = joinedRDD.mapValues { case(a,b) => 
    val goodTransac = a.getOrElse(List(GeoLoc("",0L,"","","","","")))
        .reduce((v1,v2) => minDelay(b.dateTime,v1,v2))
    SomeClass(b.id, b....., goodTransac.date_time,.....)
}
groupByKey
不应将太多元素分组(每个键最多50个)

我注意到错误发生在内存太短时,因此我决定在RAM和磁盘上持久序列化,并将序列化程序更改为Kryo。我还将
spark.memory.storageFraction
减少到
0.2
,以便为处理留出更多空间

当我检查web UI时,我发现GC在处理过程中花费了越来越多的时间。当作业最终失败时,GC在22分钟的运行时间内花费20分钟,但不是在所有工作人员身上


我已经检查过了,但是我的集群仍然有大量的RAM—在Mesos上大约有90 GB的空闲空间。

我要做的是检查
事件
RDD和
groupByKey
之后的分区数。使用

使用
StorageLevel.MEMORY\u和\u DISK\u SER
将需要更多的IO,这会减慢执行器的速度,并且给定的
SER
可能会导致更长的GC(毕竟,数据集在内存中,它们必须序列化,这几乎是内存需求的两倍)

我强烈建议此时不要使用
MEMORY\u和\u DISK\u SER

我还将查看
result
RDD的依赖关系图,看看每个阶段中使用了多少洗牌和分区

result.toDebugString
很少有地方会出错


p、 附加来自web UI的作业、阶段、存储和执行者页面的屏幕截图将非常有助于缩小根本原因。

您如何知道这是由于大型GC造成的?是否存在其他错误?您如何提交作业以及集群类型(即独立、纱线或mesos)?我看到有时候GC在22分钟的运行时间内花了20分钟太多的时间。我用的是飞艇。我已经检查了这个链接,但是我的集群仍然有足够的内存。我有4个节点,总共210 GB,但MesosCan上仍然有90 GB的可用空间。请在web UI中包含作业和阶段的屏幕截图?