Apache spark pyspark什么时候会失败;java.lang.AssertionError:断言失败;从BlockInfo.checkInvariants?

Apache spark pyspark什么时候会失败;java.lang.AssertionError:断言失败;从BlockInfo.checkInvariants?,apache-spark,pyspark,Apache Spark,Pyspark,我正在使用pyspark并收到以下消息: 17/12/03 11:57:48 WARN TaskSetManager: Lost task 0.0 in stage 5.0 (TID 1800, 172.31.27.9, executor 0): java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:156) at org.apache.spark.storage.BlockInfo.che

我正在使用pyspark并收到以下消息:

17/12/03 11:57:48 WARN TaskSetManager: Lost task 0.0 in stage 5.0 (TID 1800, 172.31.27.9, executor 0): java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:156)
at org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84)
at org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:367)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:366)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:366)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:361)
at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:736)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:342)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

17/12/03 11:57:48 INFO TaskSetManager: Starting task 0.1 in stage 5.0 (TID 1801, 172.31.27.9, executor 0, partition 0, PROCESS_LOCAL, 4871 bytes)
17/12/03 11:57:48 INFO TaskSetManager: Lost task 0.1 in stage 5.0 (TID 1801) on 172.31.27.9, executor 0: java.lang.AssertionError (assertion failed) [duplicate 1]
17/12/03 11:57:48 INFO TaskSetManager: Starting task 0.2 in stage 5.0 (TID 1802, 172.31.27.9, executor 0, partition 0, PROCESS_LOCAL, 4871 bytes)
17/12/03 11:57:48 INFO TaskSetManager: Lost task 0.2 in stage 5.0 (TID 1802) on 172.31.27.9, executor 0: java.lang.AssertionError (assertion failed) [duplicate 2]
17/12/03 11:57:48 INFO TaskSetManager: Starting task 0.3 in stage 5.0 (TID 1803, 172.31.27.9, executor 0, partition 0, PROCESS_LOCAL, 4871 bytes)
17/12/03 11:57:48 INFO TaskSetManager: Lost task 0.3 in stage 5.0 (TID 1803) on     172.31.27.9, executor 0: java.lang.AssertionError (assertion failed) [duplicate 3]
17/12/03 11:57:48 ERROR TaskSetManager: Task 0 in stage 5.0 failed 4 times; aborting job
17/12/03 11:57:48 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool 
17/12/03 11:57:48 INFO TaskSchedulerImpl: Cancelling stage 5
17/12/03 11:57:48 INFO DAGScheduler: ResultStage 5 (runJob at PythonRDD.scala:446) failed in 0.078 s due to Job aborted due to stage failure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 1803, 172.31.27.9, executor 0): java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:156)
at org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84)
at org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:367)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:366)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:366)
at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:361)
at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:736)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:342)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
其他人似乎也有类似的情况
TaskSetManager:5.0阶段的任务0失败了4次;正在中止作业
,但我特别关注以下消息:
。。。阶段5.0中的任务0失败4次,最近一次失败:阶段5.0中的任务0.3丢失(TID 1803172.31.27.9,执行器0):java.lang.AssertionError:assertion失败

我怀疑这个问题在写入RDD时会发生,写入时可能会出错,Spark以后很难解析。 我正在使用
ALS
库和
Rating
对象。因此,在保存到RDD之前,我更方便地将RDD映射到

data.map(lambda x: (x.user, x.product, x.rating)).saveAsTextFile ("hdfs://"+master_ip+":9000/RDD/data")
并作为

data = sc.textFile("hdfs://"+master_ip+":9000/RDD/data")
data = data.map(lambda x: x[1:-1]).map(lambda x: x.split(", ")).\
              map(lambda x: Rating(int(x[0]), int(x[1]), float(x[2])))
我很好奇,因为这些错误消息并不是每次都出现,也不是总是可以复制的。 我使用的是spark 2.2.0和hapdoop 2.7。以前有人见过这个吗


谢谢

因为这是一个不应该发生的不变量。这是可复制的吗?介意分享一下这个问题吗?什么是主URL/群集管理器?@Jacek不,它并不总是可复制的!我在问题中添加了部分代码。我怀疑一些可能的原因,但仍在测试中。谢谢在出现问题之前,您还有其他错误吗?看起来执行者可能被纱线杀死(可能是因为内存限制?)没有。除了留言,一切都很顺利。现在我正在测试。我重写了RDD并再次阅读。看来我没有这样的问题。我在AWS使用t2.xlarge处理每个16GB的实例。记忆不应该是问题。我会说,问题并没有消失。但是频率变小了。