Python PySpark作业抛出错误

Python PySpark作业抛出错误,python,hadoop,apache-spark,pyspark,Python,Hadoop,Apache Spark,Pyspark,我刚刚粘贴了长长的stacktrace,以防以前有人看到这个错误。我试图在hdfs集群上使用pyspark编写一个简单的KNN作业。我使用很少的输入文件来执行这项工作,所以我不认为这是一个内存(空间)。我没有在我的代码的任何部分进行广播。因此,当broadcast.py失败时,我感到惊讶?然而,我确实在共享内存中有python字典,而没有显式地进行广播 有人能帮我理解发生了什么事吗 我还将我的整个代码粘贴在下面(stacktrace上方) 堆栈跟踪: 15/05/19 13:44:11 WARN

我刚刚粘贴了长长的stacktrace,以防以前有人看到这个错误。我试图在hdfs集群上使用pyspark编写一个简单的KNN作业。我使用很少的输入文件来执行这项工作,所以我不认为这是一个内存(空间)。我没有在我的代码的任何部分进行广播。因此,当broadcast.py失败时,我感到惊讶?然而,我确实在共享内存中有python字典,而没有显式地进行广播

有人能帮我理解发生了什么事吗

我还将我的整个代码粘贴在下面(stacktrace上方)

堆栈跟踪:

15/05/19 13:44:11 WARN spark.SparkConf: Setting 'spark.executor.extraClassPath' to '/home/hadoop/spark/conf:/home/hadoop/conf:/home/hadoop/spark/classpath/emr/*:/home/hadoop/spark/classpath/emrfs/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/share/hadoop/common/lib/hadoop-lzo.jar' as a work-around.
15/05/19 13:44:11 WARN spark.SparkConf: Setting 'spark.driver.extraClassPath' to '/home/hadoop/spark/conf:/home/hadoop/conf:/home/hadoop/spark/classpath/emr/*:/home/hadoop/spark/classpath/emrfs/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/share/hadoop/common/lib/hadoop-lzo.jar' as a work-around.
15/05/19 13:44:11 INFO spark.SecurityManager: Changing view acls to: hadoop
15/05/19 13:44:11 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/05/19 13:44:11 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/05/19 13:44:12 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/05/19 13:44:12 INFO Remoting: Starting remoting
15/05/19 13:44:12 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@ip-.ec2.internal:38952]
15/05/19 13:44:12 INFO util.Utils: Successfully started service 'sparkDriver' on port 38952.
15/05/19 13:44:12 INFO spark.SparkEnv: Registering MapOutputTracker
15/05/19 13:44:12 INFO spark.SparkEnv: Registering BlockManagerMaster
15/05/19 13:44:12 INFO storage.DiskBlockManager: Created local directory at /mnt/spark/spark-7898e98c-1685-450b-a47a-2fbede361cf3/blockmgr-c3ce83af-b195-4ec9-8e1f-8d71ee0589b1
15/05/19 13:44:12 INFO storage.MemoryStore: MemoryStore started with capacity 265.4 MB
15/05/19 13:44:12 INFO spark.HttpFileServer: HTTP File server directory is /mnt/spark/spark-94ffb583-1f85-4505-8d4d-1f104d612be6/httpd-8e8e619f-c4e6-4813-ac0d-b8793e7ff84d
15/05/19 13:44:12 INFO spark.HttpServer: Starting HTTP Server
15/05/19 13:44:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/05/19 13:44:12 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:41841
15/05/19 13:44:12 INFO util.Utils: Successfully started service 'HTTP file server' on port 41841.
15/05/19 13:44:12 INFO spark.SparkEnv: Registering OutputCommitCoordinator
15/05/19 13:44:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/05/19 13:44:12 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/05/19 13:44:12 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
15/05/19 13:44:12 INFO ui.SparkUI: Started SparkUI at http://ip.ec2.internal:4040
15/05/19 13:44:13 INFO util.Utils: Copying /mnt/user/spark_practice/knn.py to /mnt/spark/spark-afbde2d3-d58a-468f-b84f-c131ecd708cd/userFiles-6918221d-be13-4ce6-adbc-a4fcbd787996/knn.py
15/05/19 13:44:13 INFO spark.SparkContext: Added file file:/mnt/user/spark_practice/knn.py at file:/mnt/user/spark_practice/knn.py with timestamp 1432043053065
15/05/19 13:44:13 INFO executor.Executor: Starting executor ID <driver> on host localhost
15/05/19 13:44:13 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@ip-.ec2.internal:38952/user/HeartbeatReceiver
15/05/19 13:44:13 INFO netty.NettyBlockTransferService: Server created on 44689
15/05/19 13:44:13 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/05/19 13:44:13 INFO storage.BlockManagerMasterActor: Registering block manager localhost:44689 with 265.4 MB RAM, BlockManagerId(<driver>, localhost, 44689)
15/05/19 13:44:13 INFO storage.BlockManagerMaster: Registered BlockManager
7412 ==================================================
55134 ==================================================
15/05/19 13:44:16 INFO storage.MemoryStore: ensureFreeSpace(253503) called with curMem=0, maxMem=278302556
15/05/19 13:44:16 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 247.6 KB, free 265.2 MB)
15/05/19 13:44:16 INFO storage.MemoryStore: ensureFreeSpace(19226) called with curMem=253503, maxMem=278302556
15/05/19 13:44:16 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 18.8 KB, free 265.1 MB)
15/05/19 13:44:16 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:44689 (size: 18.8 KB, free: 265.4 MB)
15/05/19 13:44:16 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
15/05/19 13:44:16 INFO spark.SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:-2
15/05/19 13:44:17 INFO storage.MemoryStore: ensureFreeSpace(296) called with curMem=272729, maxMem=278302556
15/05/19 13:44:17 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 296.0 B, free 265.1 MB)
15/05/19 13:44:17 INFO storage.MemoryStore: ensureFreeSpace(2504167) called with curMem=273025, maxMem=278302556
15/05/19 13:44:17 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.4 MB, free 262.8 MB)
15/05/19 13:44:17 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:44689 (size: 2.4 MB, free: 263.0 MB)
15/05/19 13:44:17 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0
15/05/19 13:44:17 INFO spark.SparkContext: Created broadcast 1 from broadcast at PythonRDD.scala:399
15/05/19 13:44:17 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library from the embedded binaries
15/05/19 13:44:17 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 77cfa96225d62546008ca339b7c2076a3da91578]
15/05/19 13:44:17 INFO mapred.FileInputFormat: Total input paths to process : 5
15/05/19 13:44:18 INFO storage.MemoryStore: ensureFreeSpace(296) called with curMem=2777192, maxMem=278302556
15/05/19 13:44:18 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 296.0 B, free 262.8 MB)
15/05/19 13:44:18 INFO storage.MemoryStore: ensureFreeSpace(2506273) called with curMem=2777488, maxMem=278302556
15/05/19 13:44:18 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.4 MB, free 260.4 MB)
15/05/19 13:44:18 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:44689 (size: 2.4 MB, free: 260.6 MB)
15/05/19 13:44:18 INFO storage.BlockManagerMaster: Updated info of block broadcast_2_piece0
15/05/19 13:44:18 INFO spark.SparkContext: Created broadcast 2 from reduceByKey at /mnt/user/spark_practice/knn.py:59
15/05/19 13:44:18 INFO spark.SparkContext: Starting job: sortByKey at /mnt/user/spark_practice/knn.py:59
15/05/19 13:44:18 INFO scheduler.DAGScheduler: Registering RDD 4 (reduceByKey at /mnt/user/spark_practice/knn.py:59)
15/05/19 13:44:18 INFO scheduler.DAGScheduler: Got job 0 (sortByKey at /mnt/user/spark_practice/knn.py:59) with 5 output partitions (allowLocal=false)
15/05/19 13:44:18 INFO scheduler.DAGScheduler: Final stage: Stage 1(sortByKey at /mnt/user/spark_practice/knn.py:59)
15/05/19 13:44:18 INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 0)
15/05/19 13:44:18 INFO scheduler.DAGScheduler: Missing parents: List(Stage 0)
15/05/19 13:44:18 INFO scheduler.DAGScheduler: Submitting Stage 0 (PairwiseRDD[4] at reduceByKey at /mnt/user/spark_practice/knn.py:59), which has no missing parents
15/05/19 13:44:18 INFO storage.MemoryStore: ensureFreeSpace(5288) called with curMem=5283761, maxMem=278302556
15/05/19 13:44:18 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 5.2 KB, free 260.4 MB)
15/05/19 13:44:18 INFO storage.MemoryStore: ensureFreeSpace(3048) called with curMem=5289049, maxMem=278302556
15/05/19 13:44:18 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 3.0 KB, free 260.4 MB)
15/05/19 13:44:18 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost:44689 (size: 3.0 KB, free: 260.6 MB)
15/05/19 13:44:18 INFO storage.BlockManagerMaster: Updated info of block broadcast_3_piece0
15/05/19 13:44:18 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:839
15/05/19 13:44:18 INFO scheduler.DAGScheduler: Submitting 5 missing tasks from Stage 0 (PairwiseRDD[4] at reduceByKey at /mnt/user/spark_practice/knn.py:59)
15/05/19 13:44:18 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 5 tasks
15/05/19 13:44:18 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, ANY, 1407 bytes)
15/05/19 13:44:18 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, ANY, 1407 bytes)
15/05/19 13:44:18 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, ANY, 1407 bytes)
15/05/19 13:44:18 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, ANY, 1407 bytes)
15/05/19 13:44:18 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, localhost, ANY, 1407 bytes)
15/05/19 13:44:18 INFO executor.Executor: Running task 2.0 in stage 0.0 (TID 2)
15/05/19 13:44:18 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
15/05/19 13:44:18 INFO executor.Executor: Running task 4.0 in stage 0.0 (TID 4)
15/05/19 13:44:18 INFO executor.Executor: Running task 3.0 in stage 0.0 (TID 3)
15/05/19 13:44:18 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
15/05/19 13:44:18 INFO executor.Executor: Fetching file:/mnt/user/spark_practice/knn.py with timestamp 1432043053065
15/05/19 13:44:18 INFO util.Utils: /mnt/user/spark_practice/knn.py has been previously copied to /mnt/spark/spark-afbde2d3-d58a-468f-b84f-c131ecd708cd/userFiles-6918221d-be13-4ce6-adbc-a4fcbd787996/knn.py
15/05/19 13:44:18 INFO rdd.HadoopRDD: Input split: hdfs://10.64.10.43:9000/user/hadoop/user/practice/knn/inputfile
15/05/19 13:44:18 INFO rdd.HadoopRDD: Input split: hdfs://10.64.10.43:9000/user/hadoop/user/practice/knn/inputfile2
15/05/19 13:44:18 INFO rdd.HadoopRDD: Input split: hdfs://10.64.10.43:9000/user/hadoop/user/practice/knn/inputfile3
15/05/19 13:44:18 INFO rdd.HadoopRDD: Input split: hdfs://10.64.10.43:9000/user/hadoop/user/practice/knn/inputfile4
15/05/19 13:44:18 INFO rdd.HadoopRDD: Input split: hdfs://10.64.10.43:9000/user/hadoop/user/practice/knn/inputfile5
15/05/19 13:44:18 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
15/05/19 13:44:18 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
15/05/19 13:44:18 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
15/05/19 13:44:18 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
15/05/19 13:44:18 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
15/05/19 13:44:19 INFO metrics.MetricsSaver: MetricsConfigRecord disabledInCluster: false instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine: false
15/05/19 13:44:19 INFO metrics.MetricsSaver: Created MetricsSaver j-24SI04Y9O1ZVF:i-58ae838e:SparkSubmit:24683 period:60 /mnt/var/em/raw/i-58ae838e_20150519_SparkSubmit_24683_raw.bin
15/05/19 13:44:19 ERROR executor.Executor: Exception in task 4.0 in stage 0.0 (TID 4)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/home/hadoop/spark/python/pyspark/worker.py", line 90, in main
    command = pickleSer.loads(command.value)
  File "/home/hadoop/spark/python/pyspark/broadcast.py", line 106, in value
    self._value = self.load(self._path)
  File "/home/hadoop/spark/python/pyspark/broadcast.py", line 87, in load
    with open(path, 'rb', 1 << 20) as f:
IOError: [Errno 2] No such file or directory: '/mnt/spark/spark-ea646b94-3f68-47a5-8e1c-b23ac0799718/pyspark-7d842875-fae2-4563-b367-92d89d292b60/tmpjlEm1f'

    at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:135)
    at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:176)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:311)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
15/05/19 13:44:19 ERROR executor.Executor: Exception in task 2.0 in stage 0.0 (TID 2)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/home/hadoop/spark/python/pyspark/worker.py", line 90, in main
    command = pickleSer.loads(command.value)
  File "/home/hadoop/spark/python/pyspark/broadcast.py", line 106, in value
    self._value = self.load(self._path)
  File "/home/hadoop/spark/python/pyspark/broadcast.py", line 87, in load
    with open(path, 'rb', 1 << 20) as f:
IOError: [Errno 2] No such file or directory: '/mnt/spark/spark-ea646b94-3f68-47a5-8e1c-b23ac0799718/pyspark-7d842875-fae2-4563-b367-92d89d292b60/tmpjlEm1f'

    at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:135)
    at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:176)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:311)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
15/05/19 13:44:19 ERROR executor.Executor: Exception in task 1.0 in stage 0.0 (TID 1)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/home/hadoop/spark/python/pyspark/worker.py", line 90, in main
    command = pickleSer.loads(command.value)
  File "/home/hadoop/spark/python/pyspark/broadcast.py", line 106, in value
    self._value = self.load(self._path)
  File "/home/hadoop/spark/python/pyspark/broadcast.py", line 87, in load
    with open(path, 'rb', 1 << 20) as f:
IOError: [Errno 2] No such file or directory: '/mnt/spark/spark-ea646b94-3f68-47a5-8e1c-b23ac0799718/pyspark-7d842875-fae2-4563-b367-92d89d292b60/tmpjlEm1f'

    at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:135)
    at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:176)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:311)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
15/05/19 13:44:19 ERROR executor.Executor: Exception in task 3.0 in stage 0.0 (TID 3)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/home/hadoop/spark/python/pyspark/worker.py", line 90, in main
    command = pickleSer.loads(command.value)
  File "/home/hadoop/spark/python/pyspark/broadcast.py", line 106, in value
    self._value = self.load(self._path)
  File "/home/hadoop/spark/python/pyspark/broadcast.py", line 87, in load
    with open(path, 'rb', 1 << 20) as f:
IOError: [Errno 2] No such file or directory: '/mnt/spark/spark-ea646b94-3f68-47a5-8e1c-b23ac0799718/pyspark-7d842875-fae2-4563-b367-92d89d292b60/tmpjlEm1f'

    at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:135)
    at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:176)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:311)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
15/05/19 13:44:19 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/home/hadoop/spark/python/pyspark/worker.py", line 90, in main
    command = pickleSer.loads(command.value)
  File "/home/hadoop/spark/python/pyspark/broadcast.py", line 106, in value
    self._value = self.load(self._path)
  File "/home/hadoop/spark/python/pyspark/broadcast.py", line 87, in load
    with open(path, 'rb', 1 << 20) as f:
IOError: [Errno 2] No such file or directory: '/mnt/spark/spark-ea646b94-3f68-47a5-8e1c-b23ac0799718/pyspark-7d842875-fae2-4563-b367-92d89d292b60/tmpjlEm1f
15/05/19 13:44:11 WARN spark.SparkConf:将“spark.executor.extraClassPath”设置为“/home/hadoop/spark/conf:/home/hadoop/conf:/home/hadoop/spark/classpath/emr/*:/home/hadoop/share/common/lib/:/home/hadoop/share/hadoop/share/hadoop/common/lib/hadoop lzo.jar”,作为一种变通方法。
15/05/19 13:44:11警告spark.SparkConf:将“spark.driver.extraClassPath”设置为“/home/hadoop/spark/conf:/home/hadoop/conf:/home/hadoop/spark/emr/*:/home/hadoop/spark/classpath/emrfs/*:/home/hadoop/share/common/lib/*:/home/hadoop/share/hadoop/common/lib hadoop lzo.jar”作为一种变通方法。
15/05/19 13:44:11 INFO spark.SecurityManager:将视图ACL更改为:hadoop
15/05/19 13:44:11 INFO spark.SecurityManager:将修改ACL更改为:hadoop
15/05/19 13:44:11 INFO spark.SecurityManager:SecurityManager:身份验证已禁用;ui ACL被禁用;具有查看权限的用户:Set(hadoop);具有修改权限的用户:Set(hadoop)
15/05/19 13:44:12信息slf4j.Slf4jLogger:Slf4jLogger启动
15/05/19 13:44:12信息远程处理:开始远程处理
15/05/19 13:44:12信息远程处理:远程处理已开始;收听地址:[阿克卡。tcp://sparkDriver@ip-.ec2.内部:38952]
15/05/19 13:44:12信息实用程序:已在端口38952上成功启动服务“sparkDriver”。
15/05/19 13:44:12信息spark.SparkEnv:注册MapOutputTracker
15/05/19 13:44:12信息spark.SparkEnv:注册BlockManagerMaster
15/05/19 13:44:12 INFO storage.DiskBlockManager:已在/mnt/spark/spark-7898e98c-1685-450b-a47a-2FBED361CF3/blockmgr-c3ce83af-b195-4ec9-8e1f-8d71ee0589b1创建本地目录
15/05/19 13:44:12信息存储。MemoryStore:MemoryStore以265.4 MB的容量启动
15/05/19 13:44:12 INFO spark.HttpFileServer:HTTP文件服务器目录是/mnt/spark/spark-94ffb583-1f85-4505-8d4d-1f104d612be6/httpd-8e8e619f-c4e6-4813-ac0d-b8793e7ff84d
15/05/19 13:44:12 INFO spark.HttpServer:正在启动HTTP服务器
15/05/19 13:44:12信息服务器。服务器:jetty-8.y.z-SNAPSHOT
15/05/19 13:44:12信息服务器。抽象连接器:已启动SocketConnector@0.0.0.0:41841
15/05/19 13:44:12 INFO util.Utils:已在端口41841上成功启动服务“HTTP文件服务器”。
15/05/19 13:44:12信息spark.SparkEnv:正在注册OutputCommitCoordinator
15/05/19 13:44:12信息服务器。服务器:jetty-8.y.z-SNAPSHOT
15/05/19 13:44:12信息服务器。抽象连接器:已启动SelectChannelConnector@0.0.0.0:4040
15/05/19 13:44:12 INFO util.Utils:已在端口4040上成功启动服务“SparkUI”。
15/05/19 13:44:12信息ui.SparkUI:从http://ip.ec2.internal:4040
15/05/19 13:44:13信息用途:复制/mnt/user/spark_practice/knn.py至/mnt/spark/spark-afbde2d3-d58a-468f-b84f-c131ecd708cd/userFiles-6918221d-be13-4ce6-adbc-a4fcbd787996/knn.py
15/05/19 13:44:13信息spark.SparkContext:添加了文件:/mnt/user/spark_practice/knn.py,位于文件:/mnt/user/spark_practice/knn.py,时间戳为1432043053065
15/05/19 13:44:13信息执行器。执行器:在主机localhost上启动执行器ID
15/05/19 13:44:13信息util.AkkaUtils:连接到HeartbeatReceiver:akka。tcp://sparkDriver@ip-.ec2.内部:38952/用户/心跳接收器
15/05/19 13:44:13 INFO netty.NettyBlockTransferService:服务器创建于44689
15/05/19 13:44:13信息存储。BlockManager管理员:正在尝试注册BlockManager
15/05/19 13:44:13 INFO storage.BlockManagerMasterActor:使用265.4 MB RAM注册块管理器localhost:44689,BlockManagerId(,localhost,44689)
15/05/19 13:44:13信息存储。BlockManager管理员:已注册的BlockManager
7412 ==================================================
55134 ==================================================
15/05/19 13:44:16 INFO storage.MemoryStore:ensureRefreeSpace(253503)调用时curMem=0,maxMem=278302556
15/05/19 13:44:16 INFO storage.MemoryStore:块广播0作为值存储在内存中(估计大小247.6 KB,可用大小265.2 MB)
15/05/19 13:44:16 INFO storage.MemoryStore:ensureRefreeSpace(19226)调用curMem=253503,maxMem=278302556
15/05/19 13:44:16 INFO storage.MemoryStore:块广播\u 0\u片段0以字节形式存储在内存中(估计大小为18.8 KB,可用大小为265.1 MB)
15/05/19 13:44:16 INFO storage.BlockManagerInfo:在本地主机44689(大小:18.8 KB,可用空间:265.4 MB)的内存中添加了广播\u 0\u片段0
15/05/19 13:44:16信息存储。BlockManagerMaster:块广播的更新信息\u 0
15/05/19 13:44:16 INFO spark.SparkContext:从NativeMethodAccessorImpl.java的文本文件创建广播0:-2
15/05/19 13:44:17 INFO storage.MemoryStore:ensureRefreeSpace(296)调用时curMem=272729,maxMem=278302556
15/05/19 13:44:17 INFO storage.MemoryStore:块广播_1作为值存储在内存中(估计大小296.0 B,可用容量265.1 MB)
15/05/19 13:44:17 INFO storage.MemoryStore:EnsureRefreeSpace(2504167)调用curMem=273025,maxMem=278302556
15/05/19 13:44:17信息存储.MemoryStore:块广播\u 1\u片段0以字节形式存储在内存中(估计大小为2.4 MB,可用大小为262.8 MB)
15/05/19 13:44:17 INFO storage.BlockManagerInfo:在本地主机44689(大小:2.4 MB,可用空间:263.0 MB)的内存中添加了广播片段0
15/05/19 13:44:17信息存储。BlockManagerMaster:块广播的更新信息\u 1\u 0
15/05/19 13:44:17信息spark.SparkContext:从Pythonrd的广播创建广播1。scala:399
15/05/19 13:44:17 INFO lzo.GPLNativeCodeLoader:加载的本机gpl库
15/05/19 13:44:11 WARN spark.SparkConf: Setting 'spark.executor.extraClassPath' to '/home/hadoop/spark/conf:/home/hadoop/conf:/home/hadoop/spark/classpath/emr/*:/home/hadoop/spark/classpath/emrfs/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/share/hadoop/common/lib/hadoop-lzo.jar' as a work-around.
15/05/19 13:44:11 WARN spark.SparkConf: Setting 'spark.driver.extraClassPath' to '/home/hadoop/spark/conf:/home/hadoop/conf:/home/hadoop/spark/classpath/emr/*:/home/hadoop/spark/classpath/emrfs/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/share/hadoop/common/lib/hadoop-lzo.jar' as a work-around.
15/05/19 13:44:11 INFO spark.SecurityManager: Changing view acls to: hadoop
15/05/19 13:44:11 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/05/19 13:44:11 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/05/19 13:44:12 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/05/19 13:44:12 INFO Remoting: Starting remoting
15/05/19 13:44:12 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@ip-.ec2.internal:38952]
15/05/19 13:44:12 INFO util.Utils: Successfully started service 'sparkDriver' on port 38952.
15/05/19 13:44:12 INFO spark.SparkEnv: Registering MapOutputTracker
15/05/19 13:44:12 INFO spark.SparkEnv: Registering BlockManagerMaster
15/05/19 13:44:12 INFO storage.DiskBlockManager: Created local directory at /mnt/spark/spark-7898e98c-1685-450b-a47a-2fbede361cf3/blockmgr-c3ce83af-b195-4ec9-8e1f-8d71ee0589b1
15/05/19 13:44:12 INFO storage.MemoryStore: MemoryStore started with capacity 265.4 MB
15/05/19 13:44:12 INFO spark.HttpFileServer: HTTP File server directory is /mnt/spark/spark-94ffb583-1f85-4505-8d4d-1f104d612be6/httpd-8e8e619f-c4e6-4813-ac0d-b8793e7ff84d
15/05/19 13:44:12 INFO spark.HttpServer: Starting HTTP Server
15/05/19 13:44:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/05/19 13:44:12 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:41841
15/05/19 13:44:12 INFO util.Utils: Successfully started service 'HTTP file server' on port 41841.
15/05/19 13:44:12 INFO spark.SparkEnv: Registering OutputCommitCoordinator
15/05/19 13:44:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/05/19 13:44:12 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/05/19 13:44:12 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
15/05/19 13:44:12 INFO ui.SparkUI: Started SparkUI at http://ip.ec2.internal:4040
15/05/19 13:44:13 INFO util.Utils: Copying /mnt/user/spark_practice/knn.py to /mnt/spark/spark-afbde2d3-d58a-468f-b84f-c131ecd708cd/userFiles-6918221d-be13-4ce6-adbc-a4fcbd787996/knn.py
15/05/19 13:44:13 INFO spark.SparkContext: Added file file:/mnt/user/spark_practice/knn.py at file:/mnt/user/spark_practice/knn.py with timestamp 1432043053065
15/05/19 13:44:13 INFO executor.Executor: Starting executor ID <driver> on host localhost
15/05/19 13:44:13 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@ip-.ec2.internal:38952/user/HeartbeatReceiver
15/05/19 13:44:13 INFO netty.NettyBlockTransferService: Server created on 44689
15/05/19 13:44:13 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/05/19 13:44:13 INFO storage.BlockManagerMasterActor: Registering block manager localhost:44689 with 265.4 MB RAM, BlockManagerId(<driver>, localhost, 44689)
15/05/19 13:44:13 INFO storage.BlockManagerMaster: Registered BlockManager
7412 ==================================================
55134 ==================================================
15/05/19 13:44:16 INFO storage.MemoryStore: ensureFreeSpace(253503) called with curMem=0, maxMem=278302556
15/05/19 13:44:16 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 247.6 KB, free 265.2 MB)
15/05/19 13:44:16 INFO storage.MemoryStore: ensureFreeSpace(19226) called with curMem=253503, maxMem=278302556
15/05/19 13:44:16 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 18.8 KB, free 265.1 MB)
15/05/19 13:44:16 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:44689 (size: 18.8 KB, free: 265.4 MB)
15/05/19 13:44:16 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
15/05/19 13:44:16 INFO spark.SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:-2
15/05/19 13:44:17 INFO storage.MemoryStore: ensureFreeSpace(296) called with curMem=272729, maxMem=278302556
15/05/19 13:44:17 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 296.0 B, free 265.1 MB)
15/05/19 13:44:17 INFO storage.MemoryStore: ensureFreeSpace(2504167) called with curMem=273025, maxMem=278302556
15/05/19 13:44:17 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.4 MB, free 262.8 MB)
15/05/19 13:44:17 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:44689 (size: 2.4 MB, free: 263.0 MB)
15/05/19 13:44:17 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0
15/05/19 13:44:17 INFO spark.SparkContext: Created broadcast 1 from broadcast at PythonRDD.scala:399
15/05/19 13:44:17 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library from the embedded binaries
15/05/19 13:44:17 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 77cfa96225d62546008ca339b7c2076a3da91578]
15/05/19 13:44:17 INFO mapred.FileInputFormat: Total input paths to process : 5
15/05/19 13:44:18 INFO storage.MemoryStore: ensureFreeSpace(296) called with curMem=2777192, maxMem=278302556
15/05/19 13:44:18 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 296.0 B, free 262.8 MB)
15/05/19 13:44:18 INFO storage.MemoryStore: ensureFreeSpace(2506273) called with curMem=2777488, maxMem=278302556
15/05/19 13:44:18 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.4 MB, free 260.4 MB)
15/05/19 13:44:18 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:44689 (size: 2.4 MB, free: 260.6 MB)
15/05/19 13:44:18 INFO storage.BlockManagerMaster: Updated info of block broadcast_2_piece0
15/05/19 13:44:18 INFO spark.SparkContext: Created broadcast 2 from reduceByKey at /mnt/user/spark_practice/knn.py:59
15/05/19 13:44:18 INFO spark.SparkContext: Starting job: sortByKey at /mnt/user/spark_practice/knn.py:59
15/05/19 13:44:18 INFO scheduler.DAGScheduler: Registering RDD 4 (reduceByKey at /mnt/user/spark_practice/knn.py:59)
15/05/19 13:44:18 INFO scheduler.DAGScheduler: Got job 0 (sortByKey at /mnt/user/spark_practice/knn.py:59) with 5 output partitions (allowLocal=false)
15/05/19 13:44:18 INFO scheduler.DAGScheduler: Final stage: Stage 1(sortByKey at /mnt/user/spark_practice/knn.py:59)
15/05/19 13:44:18 INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 0)
15/05/19 13:44:18 INFO scheduler.DAGScheduler: Missing parents: List(Stage 0)
15/05/19 13:44:18 INFO scheduler.DAGScheduler: Submitting Stage 0 (PairwiseRDD[4] at reduceByKey at /mnt/user/spark_practice/knn.py:59), which has no missing parents
15/05/19 13:44:18 INFO storage.MemoryStore: ensureFreeSpace(5288) called with curMem=5283761, maxMem=278302556
15/05/19 13:44:18 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 5.2 KB, free 260.4 MB)
15/05/19 13:44:18 INFO storage.MemoryStore: ensureFreeSpace(3048) called with curMem=5289049, maxMem=278302556
15/05/19 13:44:18 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 3.0 KB, free 260.4 MB)
15/05/19 13:44:18 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost:44689 (size: 3.0 KB, free: 260.6 MB)
15/05/19 13:44:18 INFO storage.BlockManagerMaster: Updated info of block broadcast_3_piece0
15/05/19 13:44:18 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:839
15/05/19 13:44:18 INFO scheduler.DAGScheduler: Submitting 5 missing tasks from Stage 0 (PairwiseRDD[4] at reduceByKey at /mnt/user/spark_practice/knn.py:59)
15/05/19 13:44:18 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 5 tasks
15/05/19 13:44:18 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, ANY, 1407 bytes)
15/05/19 13:44:18 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, ANY, 1407 bytes)
15/05/19 13:44:18 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, ANY, 1407 bytes)
15/05/19 13:44:18 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, ANY, 1407 bytes)
15/05/19 13:44:18 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, localhost, ANY, 1407 bytes)
15/05/19 13:44:18 INFO executor.Executor: Running task 2.0 in stage 0.0 (TID 2)
15/05/19 13:44:18 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
15/05/19 13:44:18 INFO executor.Executor: Running task 4.0 in stage 0.0 (TID 4)
15/05/19 13:44:18 INFO executor.Executor: Running task 3.0 in stage 0.0 (TID 3)
15/05/19 13:44:18 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
15/05/19 13:44:18 INFO executor.Executor: Fetching file:/mnt/user/spark_practice/knn.py with timestamp 1432043053065
15/05/19 13:44:18 INFO util.Utils: /mnt/user/spark_practice/knn.py has been previously copied to /mnt/spark/spark-afbde2d3-d58a-468f-b84f-c131ecd708cd/userFiles-6918221d-be13-4ce6-adbc-a4fcbd787996/knn.py
15/05/19 13:44:18 INFO rdd.HadoopRDD: Input split: hdfs://10.64.10.43:9000/user/hadoop/user/practice/knn/inputfile
15/05/19 13:44:18 INFO rdd.HadoopRDD: Input split: hdfs://10.64.10.43:9000/user/hadoop/user/practice/knn/inputfile2
15/05/19 13:44:18 INFO rdd.HadoopRDD: Input split: hdfs://10.64.10.43:9000/user/hadoop/user/practice/knn/inputfile3
15/05/19 13:44:18 INFO rdd.HadoopRDD: Input split: hdfs://10.64.10.43:9000/user/hadoop/user/practice/knn/inputfile4
15/05/19 13:44:18 INFO rdd.HadoopRDD: Input split: hdfs://10.64.10.43:9000/user/hadoop/user/practice/knn/inputfile5
15/05/19 13:44:18 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
15/05/19 13:44:18 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
15/05/19 13:44:18 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
15/05/19 13:44:18 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
15/05/19 13:44:18 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
15/05/19 13:44:19 INFO metrics.MetricsSaver: MetricsConfigRecord disabledInCluster: false instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine: false
15/05/19 13:44:19 INFO metrics.MetricsSaver: Created MetricsSaver j-24SI04Y9O1ZVF:i-58ae838e:SparkSubmit:24683 period:60 /mnt/var/em/raw/i-58ae838e_20150519_SparkSubmit_24683_raw.bin
15/05/19 13:44:19 ERROR executor.Executor: Exception in task 4.0 in stage 0.0 (TID 4)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/home/hadoop/spark/python/pyspark/worker.py", line 90, in main
    command = pickleSer.loads(command.value)
  File "/home/hadoop/spark/python/pyspark/broadcast.py", line 106, in value
    self._value = self.load(self._path)
  File "/home/hadoop/spark/python/pyspark/broadcast.py", line 87, in load
    with open(path, 'rb', 1 << 20) as f:
IOError: [Errno 2] No such file or directory: '/mnt/spark/spark-ea646b94-3f68-47a5-8e1c-b23ac0799718/pyspark-7d842875-fae2-4563-b367-92d89d292b60/tmpjlEm1f'

    at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:135)
    at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:176)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:311)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
15/05/19 13:44:19 ERROR executor.Executor: Exception in task 2.0 in stage 0.0 (TID 2)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/home/hadoop/spark/python/pyspark/worker.py", line 90, in main
    command = pickleSer.loads(command.value)
  File "/home/hadoop/spark/python/pyspark/broadcast.py", line 106, in value
    self._value = self.load(self._path)
  File "/home/hadoop/spark/python/pyspark/broadcast.py", line 87, in load
    with open(path, 'rb', 1 << 20) as f:
IOError: [Errno 2] No such file or directory: '/mnt/spark/spark-ea646b94-3f68-47a5-8e1c-b23ac0799718/pyspark-7d842875-fae2-4563-b367-92d89d292b60/tmpjlEm1f'

    at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:135)
    at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:176)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:311)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
15/05/19 13:44:19 ERROR executor.Executor: Exception in task 1.0 in stage 0.0 (TID 1)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/home/hadoop/spark/python/pyspark/worker.py", line 90, in main
    command = pickleSer.loads(command.value)
  File "/home/hadoop/spark/python/pyspark/broadcast.py", line 106, in value
    self._value = self.load(self._path)
  File "/home/hadoop/spark/python/pyspark/broadcast.py", line 87, in load
    with open(path, 'rb', 1 << 20) as f:
IOError: [Errno 2] No such file or directory: '/mnt/spark/spark-ea646b94-3f68-47a5-8e1c-b23ac0799718/pyspark-7d842875-fae2-4563-b367-92d89d292b60/tmpjlEm1f'

    at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:135)
    at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:176)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:311)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
15/05/19 13:44:19 ERROR executor.Executor: Exception in task 3.0 in stage 0.0 (TID 3)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/home/hadoop/spark/python/pyspark/worker.py", line 90, in main
    command = pickleSer.loads(command.value)
  File "/home/hadoop/spark/python/pyspark/broadcast.py", line 106, in value
    self._value = self.load(self._path)
  File "/home/hadoop/spark/python/pyspark/broadcast.py", line 87, in load
    with open(path, 'rb', 1 << 20) as f:
IOError: [Errno 2] No such file or directory: '/mnt/spark/spark-ea646b94-3f68-47a5-8e1c-b23ac0799718/pyspark-7d842875-fae2-4563-b367-92d89d292b60/tmpjlEm1f'

    at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:135)
    at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:176)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:311)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
15/05/19 13:44:19 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/home/hadoop/spark/python/pyspark/worker.py", line 90, in main
    command = pickleSer.loads(command.value)
  File "/home/hadoop/spark/python/pyspark/broadcast.py", line 106, in value
    self._value = self.load(self._path)
  File "/home/hadoop/spark/python/pyspark/broadcast.py", line 87, in load
    with open(path, 'rb', 1 << 20) as f:
IOError: [Errno 2] No such file or directory: '/mnt/spark/spark-ea646b94-3f68-47a5-8e1c-b23ac0799718/pyspark-7d842875-fae2-4563-b367-92d89d292b60/tmpjlEm1f