Apache spark EMR Spark 2.1.0进程在org.apache.Spark.unsafe.Platform.copyMemory(Platform.java:210)上卡住

Apache spark EMR Spark 2.1.0进程在org.apache.Spark.unsafe.Platform.copyMemory(Platform.java:210)上卡住,apache-spark,spark-dataframe,emr,amazon-emr,parquet,Apache Spark,Spark Dataframe,Emr,Amazon Emr,Parquet,我有一个EMR应用程序,它读取json文件,将层次结构“分解”成一个关系表,然后将其作为拼花写出来。源和目标是HDF。应用程序已经正常运行了大约一个月,今天突然——我假设是由于一个新的源数据集——它没有终止。日志表明,它在某个点被此驱动程序输出卡住: 17/04/18 01:41:14 INFO DAGScheduler: Final stage: ResultStage 4 (save at NativeMethodAccessorImpl.java:0) 17/04/18 01:41:14

我有一个EMR应用程序,它读取json文件,将层次结构“分解”成一个关系表,然后将其作为拼花写出来。源和目标是HDF。应用程序已经正常运行了大约一个月,今天突然——我假设是由于一个新的源数据集——它没有终止。日志表明,它在某个点被此驱动程序输出卡住:

17/04/18 01:41:14 INFO DAGScheduler: Final stage: ResultStage 4 (save at NativeMethodAccessorImpl.java:0)
17/04/18 01:41:14 INFO DAGScheduler: Parents of final stage: List()
17/04/18 01:41:14 INFO DAGScheduler: Missing parents: List()
17/04/18 01:41:14 INFO DAGScheduler: Submitting ResultStage 4 (MapPartitionsRDD[31] at save at NativeMethodAccessorImpl.java:0), which has no missing parents
17/04/18 01:41:14 INFO MemoryStore: Block broadcast_9 stored as values in memory (estimated size 170.5 KB, free 2.2 GB)
17/04/18 01:41:14 INFO MemoryStore: Block broadcast_9_piece0 stored as bytes in memory (estimated size 65.2 KB, free 2.2 GB)
17/04/18 01:41:14 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on xxx.xxx.xxx.xx:37907 (size: 65.2 KB, free: 2.2 GB)
17/04/18 01:41:14 INFO SparkContext: Created broadcast 9 from broadcast at DAGScheduler.scala:996
17/04/18 01:41:14 INFO DAGScheduler: Submitting 9 missing tasks from ResultStage 4 (MapPartitionsRDD[31] at save at NativeMethodAccessorImpl.java:0)
17/04/18 01:41:14 INFO YarnScheduler: Adding task set 4.0 with 9 tasks
17/04/18 01:41:14 INFO TaskSetManager: Starting task 1.0 in stage 4.0 (TID 72, xxx.xxx.xx.xx.xx, executor 12, partition 1, NODE_LOCAL, 8184 bytes)
17/04/18 01:41:14 INFO TaskSetManager: Starting task 0.0 in stage 4.0 (TID 73, xxx.xxx.xx.xx.xx, executor 13, partition 0, NODE_LOCAL, 7967 bytes)
17/04/18 01:41:14 INFO TaskSetManager: Starting task 2.0 in stage 4.0 (TID 74, xxx.xxx.xx.xx.xx, executor 14, partition 2, NODE_LOCAL, 8181 bytes)
17/04/18 01:41:14 INFO TaskSetManager: Starting task 6.0 in stage 4.0 (TID 75, xxx.xxx.xx.xx.xx, executor 16, partition 6, NODE_LOCAL, 8400 bytes)
17/04/18 01:41:14 INFO TaskSetManager: Starting task 7.0 in stage 4.0 (TID 76, xxx.xxx.xx.xx.xx, executor 10, partition 7, NODE_LOCAL, 8398 bytes)
17/04/18 01:41:14 INFO TaskSetManager: Starting task 3.0 in stage 4.0 (TID 77, xxx.xxx.xx.xx.xx, executor 11, partition 3, NODE_LOCAL, 8182 bytes)
17/04/18 01:41:14 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on xxx.xxx.xx.xx.xx:46030 (size: 65.2 KB, free: 4.0 GB)
17/04/18 01:41:14 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on xxx.xxx.xx.xx.xx:40494 (size: 65.2 KB, free: 4.0 GB)
17/04/18 01:41:14 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on xxx.xxx.xx.xx.xx:35861 (size: 65.2 KB, free: 4.0 GB)
17/04/18 01:41:14 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on xxx.xxx.xx.xx.xx:34157 (size: 65.2 KB, free: 4.0 GB)
17/04/18 01:41:14 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on xxx.xxx.xx.xx.xx:43202 (size: 65.2 KB, free: 4.0 GB)
17/04/18 01:41:14 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on xxx.xxx.xx.xx.xx:46053 (size: 65.2 KB, free: 4.0 GB)
17/04/18 01:41:14 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on xxx.xxx.xx.xx.xx:46030 (size: 28.0 KB, free: 4.0 GB)
17/04/18 01:41:15 INFO ExecutorAllocationManager: Requesting 9 new executors because tasks are backlogged (new desired total will be 9)
17/04/18 01:41:15 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on xxx.xxx.xx.xx.xx:34157 (size: 28.0 KB, free: 4.0 GB)
17/04/18 01:41:15 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on xxx.xxx.xx.xx.xx:40494 (size: 28.0 KB, free: 4.0 GB)
17/04/18 01:41:15 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on xxx.xxx.xx.xx.xx:35861 (size: 28.0 KB, free: 4.0 GB)
17/04/18 01:41:15 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on xxx.xxx.xx.xx.xx:46053 (size: 28.0 KB, free: 4.0 GB)
17/04/18 01:41:15 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on xxx.xxx.xx.xx.xx:43202 (size: 28.0 KB, free: 4.0 GB)
17/04/18 01:41:17 INFO TaskSetManager: Starting task 4.0 in stage 4.0 (TID 78, xxx.xxx.xx.xx.xx, executor 15, partition 4, RACK_LOCAL, 8400 bytes)
17/04/18 01:41:17 INFO TaskSetManager: Starting task 5.0 in stage 4.0 (TID 79, xxx.xxx.xx.xx.xx, executor 9, partition 5, RACK_LOCAL, 8400 bytes)
17/04/18 01:41:17 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on xxx.xxx.xx.xx.xx:34045 (size: 65.2 KB, free: 4.0 GB)
17/04/18 01:41:17 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on xxx.xxx.xx.xx.xx:43887 (size: 65.2 KB, free: 4.0 GB)
17/04/18 01:41:18 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on xxx.xxx.xx.xx.xx:34045 (size: 28.0 KB, free: 4.0 GB)
17/04/18 01:41:18 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on xxx.xxx.xx.xx.xx:43887 (size: 28.0 KB, free: 4.0 GB)
当我连接到集群中的一个节点并运行jstack时,我会得到以下堆栈跟踪(其中一些看起来不太有趣):


任何建议都将不胜感激。

FWIW,根据API,“处于可运行状态的线程正在Java虚拟机中执行,但它可能正在等待来自操作系统的其他资源,如处理器。”已打开以进行跟踪this@gae123你有没有找到解决问题的方法?FWIW,根据API,“处于可运行状态的线程正在Java虚拟机中执行,但它可能正在等待来自操作系统的其他资源,如处理器。”已打开以跟踪this@gae123你有没有找到解决问题的办法?
"Executor task launch worker-0" #39 daemon prio=5 os_prio=0 tid=0x00007f6210352800 nid=0x4542 runnable [0x00007f61f52b3000]
   java.lang.Thread.State: RUNNABLE
    at org.apache.spark.unsafe.Platform.copyMemory(Platform.java:210)
    at org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.writeToMemory(UnsafeArrayData.java:311)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply6_2$(Unknown Source)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
    at org.apache.spark.sql.execution.GenerateExec$$anonfun$doExecute$1$$anonfun$apply$9.apply(GenerateExec.scala:111)
    at org.apache.spark.sql.execution.GenerateExec$$anonfun$doExecute$1$$anonfun$apply$9.apply(GenerateExec.scala:109)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
    at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:211)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:243)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:190)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:188)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:193)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:99)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)