Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark WARN ReliableDeliverySupervisor:与远程系统的关联已失败,地址现在为[5000]毫秒。原因:[已解除关联]_Apache Spark_Apache Spark Sql_Emr - Fatal编程技术网

Apache spark WARN ReliableDeliverySupervisor:与远程系统的关联已失败,地址现在为[5000]毫秒。原因:[已解除关联]

Apache spark WARN ReliableDeliverySupervisor:与远程系统的关联已失败,地址现在为[5000]毫秒。原因:[已解除关联],apache-spark,apache-spark-sql,emr,Apache Spark,Apache Spark Sql,Emr,我在aws spark上运行以下句子 val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.implicits._ case class Wiki(project: String, title: String, count: Int, byte_size: String) val data = sc.textFile("s3n://+++/").map(_.split(" ")).filter(_

我在aws spark上运行以下句子

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._

case class Wiki(project: String, title: String, count: Int, byte_size: String)

val data = sc.textFile("s3n://+++/").map(_.split(" ")).filter(_.size ==4 ).map(p => Wiki(p(0), p(1), p(2).trim.toInt, p(3)))

val df = data.toDF()
df.printSchema()

val en_agg_df = df.filter("project = 'en'").select("title","count").groupBy("title").sum().collect()
运行约2小时后,可能会出现以下错误:

WARN ReliableDeliverySupervisor: Association with remote system    [akka.tcp://sparkYarnAM@172.31.14.190:42514] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 
15/10/15 17:38:36 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster has disassociated: 172.31.14.190:42514
15/10/15 17:38:36 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster has disassociated: 172.31.14.190:42514
15/10/15 17:38:36 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@ip-172-31-14-190.ap-northeast-1.compute.internal:43340] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 
15/10/15 17:38:36 ERROR YarnScheduler: Lost executor 1 on ip-172-31-14-190.ap-northeast-1.compute.internal: remote Rpc client disassociated
15/10/15 17:38:36 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 0.0
15/10/15 17:38:36 WARN TaskSetManager: Lost task 4736.0 in stage 0.0 (TID 4736, ip-172-31-14-190.ap-northeast-1.compute.internal): ExecutorLostFailure (executor 1 lost)
15/10/15 17:38:36 INFO DAGScheduler: Executor lost: 1 (epoch 0)
15/10/15 17:38:36 INFO BlockManagerMasterEndpoint: Trying to remove   executor 1 from BlockManagerMaster.
15/10/15 17:38:36 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, ip-172-31-14-190.ap-northeast-1.compute.internal, 58890)
15/10/15 17:38:36 INFO BlockManagerMaster: Removed 1 successfully in removeExecutor
15/10/15 17:38:36 ERROR YarnScheduler: Lost executor 2 on ip-172-31-14-190.ap-northeast-1.compute.internal: remote Rpc client disassociated
15/10/15 17:38:36 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@ip-172-31-14-190.ap-northeast-1.compute.internal:60961] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 
15/10/15 17:38:36 INFO TaskSetManager: Re-queueing tasks for 2 from TaskSet 0.0
15/10/15 17:38:36 WARN TaskSetManager: Lost task 4735.0 in stage 0.0 (TID 4735, ip-172-31-14-190.ap-northeast-1.compute.internal): ExecutorLostFailure (executor 2 lost)
15/10/15 17:38:36 INFO DAGScheduler: Executor lost: 2 (epoch 0)
15/10/15 17:38:36 INFO BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster.
15/10/15 17:38:36 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(2, ip-172-31-14-190.ap-northeast-1.compute.internal, 58811)
15/10/15 17:38:36 INFO BlockManagerMaster: Removed 2 successfully in removeExecutor

这是什么意思?如何修复它?

答案似乎已经在评论中给出:

这似乎是对执行人的记忆不足,因为如果我 向集群添加更多计算机


执行者可能耗尽了内存。因此,您需要检查容器日志中丢失的一个执行器,可能还有它运行的节点上的Thread nodemanager日志。@Christopher thx非常感谢您的评论!它成功了还是找到了更多错误信息?@Christopher我想你是对的。这似乎是executor上的内存不足,因为如果我向集群中添加更多的机器,情况会很好