Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/reporting-services/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark Spark 1.3.0:ExecutorLostFailure,具体取决于输入文件大小_Apache Spark_Apache Spark 1.3 - Fatal编程技术网

Apache spark Spark 1.3.0:ExecutorLostFailure,具体取决于输入文件大小

Apache spark Spark 1.3.0:ExecutorLostFailure,具体取决于输入文件大小,apache-spark,apache-spark-1.3,Apache Spark,Apache Spark 1.3,我试图在以独立模式设置的2节点集群上运行一个简单的python应用程序。主人和工人,而主人也扮演工人的角色 在下面的代码中,我试图计算一个500MB文本文件中出现的蛋糕数量,但由于ExecutorLostFailure而失败 有趣的是,如果我使用100MB的输入文件,应用程序就会运行 我使用了带有纱线的CDH5.4.4软件包版本,我正在运行Spark 1.3.0。 每个节点都有8GB内存,以下是我的一些配置: 执行器内存:4g 驱动程序内存:2g 每个工人的芯数:1 序列化程序:Kryo S

我试图在以独立模式设置的2节点集群上运行一个简单的python应用程序。主人和工人,而主人也扮演工人的角色

在下面的代码中,我试图计算一个500MB文本文件中出现的蛋糕数量,但由于ExecutorLostFailure而失败

有趣的是,如果我使用100MB的输入文件,应用程序就会运行

我使用了带有纱线的CDH5.4.4软件包版本,我正在运行Spark 1.3.0。 每个节点都有8GB内存,以下是我的一些配置:

  • 执行器内存:4g
  • 驱动程序内存:2g
  • 每个工人的芯数:1
  • 序列化程序:Kryo
SimpleApp.py:

from pyspark import SparkContext, SparkConf
sc = SparkContext(appName="Simple App")
logFile = "/user/ubuntu/largeTextFile500m.txt"
logData = sc.textFile(logFile)
cakes = logData.filter(lambda s: "cake" in s).count()
print "Number of cakes: %i" % cakes
sc.stop()
提交申请:

spark-submit --master spark://master:7077 /home/ubuntu/SimpleApp.py
从日志中删除:



    15/08/13 09:04:59 WARN ThreadLocalRandom: Failed to generate a seed from SecureRandom within 3 seconds. Not enough entrophy?
    ...
    15/08/13 09:05:09 ERROR TaskSchedulerImpl: Lost executor 1 on master: remote Akka client disassociated
    15/08/13 09:05:09 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 0.0
    15/08/13 09:05:09 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, master): ExecutorLostFailure (executor 1 lost)
    ...
    15/08/13 09:05:09 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 1
    ...
    15/08/13 09:05:13 ERROR TaskSchedulerImpl: Lost executor 0 on worker: remote Akka client disassociated
    15/08/13 09:05:13 INFO TaskSetManager: Re-queueing tasks for 0 from TaskSet 0.0
    15/08/13 09:05:13 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 5, worker): ExecutorLostFailure (executor 0 lost)
    ...
    15/08/13 09:05:13 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 0
    ...
    15/08/13 09:05:21 ERROR TaskSchedulerImpl: Lost executor 2 on master: remote Akka client disassociated
    15/08/13 09:05:21 INFO TaskSetManager: Re-queueing tasks for 2 from TaskSet 0.0
    15/08/13 09:05:21 WARN TaskSetManager: Lost task 0.2 in stage 0.0 (TID 6, master): ExecutorLostFailure (executor 2 lost)
    ...
    15/08/13 09:05:21 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 2
    ...
    15/08/13 09:05:29 ERROR TaskSchedulerImpl: Lost executor 3 on worker: remote Akka client disassociated
    15/08/13 09:05:29 INFO TaskSetManager: Re-queueing tasks for 3 from TaskSet 0.0
    15/08/13 09:05:29 WARN TaskSetManager: Lost task 0.3 in stage 0.0 (TID 7, worker): ExecutorLostFailure (executor 3 lost)
    ...
    15/08/13 09:05:29 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 3
    ...
    15/08/13 09:05:29 INFO DAGScheduler: Job 0 failed: count at /home/ubuntu/SimpleApp.py:6, took 28.156765 s
    Traceback (most recent call last):
      File "/home/ubuntu/Michael/SimpleApp2.py", line 6, in 
        cakes = logData.filter(lambda s: "cake" in s).count()
      File "/usr/lib/spark/python/pyspark/rdd.py", line 933, in count
        return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
      File "/usr/lib/spark/python/pyspark/rdd.py", line 924, in sum
        return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add)
      File "/usr/lib/spark/python/pyspark/rdd.py", line 740, in reduce
        vals = self.mapPartitions(func).collect()
      File "/usr/lib/spark/python/pyspark/rdd.py", line 701, in collect
        bytesInJava = self._jrdd.collect().iterator()
      File "/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
      File "/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
    py4j.protocol.Py4JJavaError15/08/13 09:05:29 INFO DAGScheduler: Executor lost: 3 (epoch 3)
    15/08/13 09:05:29 INFO BlockManagerMasterActor: Trying to remove executor 3 from BlockManagerMaster.
    15/08/13 09:05:29 INFO AppClient$ClientActor: Executor updated: app-20150813090456-0000/5 is now RUNNING
    15/08/13 09:05:29 INFO BlockManagerMasterActor: Removing block manager BlockManagerId(3, worker, 4075)
    : An error occurred while calling o41.collect.
    : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 7, worker): ExecutorLostFailure (executor 3 lost)
    Driver stacktrace:
            at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
            at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
            at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
            at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
            at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
            at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
            at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
            at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
            at scala.Option.foreach(Option.scala:236)
            at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
            at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
            at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
            at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

    15/08/13 09:05:29 INFO BlockManagerMaster: Removed 3 successfully in removeExecutor
    15/08/13 09:05:29 INFO AppClient$ClientActor: Executor updated: app-20150813090456-0000/5 is now LOADING


    15/08/12 15:23:28 DEBUG DFSClient: DFSClient seqno: 20 status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 857203
        numAs = logData.filter(lambda s: "cake" in s).count()
      File "/usr/lib/spark/python/pyspark/rdd.py", line 933, in count
        return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
      File "/usr/lib/spark/python/pyspark/rdd.py", line 924, in sum
        return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add)
      File "/usr/lib/spark/python/pyspark/rdd.py", line 740, in reduce
        vals = self.mapPartitions(func).collect()
      File "/usr/lib/spark/python/pyspark/rdd.py", line 701, in collect
        bytesInJava = self._jrdd.collect().iterator()
      File "/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
      File "/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
    py4j.protocol.Py4JJavaError: An error occurred while calling o43.collect.
    : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 4, master): ExecutorLostFailure (executor 4 lost)
    Driver stacktrace:
            at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
            at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
            at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
            at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
            at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
            at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
            at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
            at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
            at scala.Option.foreach(Option.scala:236)
            at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
            at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
            at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
            at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)


有什么建议吗?

有关于此错误的消息吗?是的,问题与安装或配置无关。我使用了一个文本文件作为输入,其中包含用空格分隔的随机单词,没有换行符。由于这个原因,Spark无法拆分我的500mb文件,并试图在一个作业中处理它。