Scala 使用Spark mllib在磁盘上保存模型时发生java.lang.OutOfMemoryError_Scala_Apache Spark_Out Of Memory_Apache Spark Mllib

Scala 使用Spark mllib在磁盘上保存模型时发生java.lang.OutOfMemoryError

scala apache-spark

Scala 使用Spark mllib在磁盘上保存模型时发生java.lang.OutOfMemoryError,scala,apache-spark,out-of-memory,apache-spark-mllib,Scala,Apache Spark,Out Of Memory,Apache Spark Mllib,我试图在一个约1000个文档的非常小的数据集上运行LDA。LDA 工作很好，我也能够保存模型如果我在不使用lDAModel.save（）的情况下运行该程序，则在最后会得到以下结果： 16/03/13 14:26:52 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:53759 16/03/13 14:26:52 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEn

我试图在一个约1000个文档的非常小的数据集上运行LDA。LDA 工作很好，我也能够保存模型

如果我在不使用lDAModel.save（）的情况下运行该程序，则在最后会得到以下结果：

16/03/13 14:26:52 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:53759
16/03/13 14:26:52 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/03/13 14:26:52 INFO MemoryStore: MemoryStore cleared
16/03/13 14:26:52 INFO BlockManager: BlockManager stopped
16/03/13 14:26:52 INFO BlockManagerMaster: BlockManagerMaster stopped
16/03/13 14:26:52 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/03/13 14:26:52 INFO SparkContext: Successfully stopped SparkContext
16/03/13 14:26:52 INFO ShutdownHookManager: Shutdown hook called
16/03/13 14:26:52 INFO ShutdownHookManager: Deleting directory /tmp/spark-753c7923-b623-45a7-afd1-5738766d7571

但是，如果保存模型，则在输出结束时会得到以下结果：

16/03/13 14:44:01 INFO deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
16/03/13 14:44:01 INFO deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
16/03/13 14:44:01 INFO deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
16/03/13 14:44:01 INFO deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
16/03/13 14:44:01 INFO FileOutputCommitter: Saved output of task 'attempt_201603131444_0041_m_000000_151' to file:/tmp/tempLDA.model/metadata/_temporary/0/task_201603131444_0041_m_000000
16/03/13 14:44:01 INFO SparkHadoopMapRedUtil: attempt_201603131444_0041_m_000000_151: Committed
16/03/13 14:44:01 INFO Executor: Finished task 0.0 in stage 41.0 (TID 151). 873 bytes result sent to driver
16/03/13 14:44:01 INFO TaskSetManager: Finished task 0.0 in stage 41.0 (TID 151) in 85 ms on localhost (1/1)
16/03/13 14:44:01 INFO TaskSchedulerImpl: Removed TaskSet 41.0, whose tasks have all completed, from pool 
16/03/13 14:44:01 INFO DAGScheduler: ResultStage 41 (saveAsTextFile at LDAModel.scala:433) finished in 0.085 s
16/03/13 14:44:01 INFO DAGScheduler: Job 39 finished: saveAsTextFile at LDAModel.scala:433, took 0.116725 s
16/03/13 14:44:01 INFO BlockManagerInfo: Removed broadcast_53_piece0 on localhost:44879 in memory (size: 16.4 KB, free: 1087.1 MB)
16/03/13 14:44:01 INFO ContextCleaner: Cleaned accumulator 44
Exception in thread "main" 16/03/13 14:44:02 INFO BlockManagerInfo: Removed broadcast_0_piece0 on localhost:44879 in memory (size: 10.0 KB, free: 1087.1 MB)

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"

Process finished with exit code 1

模型保存在第二种情况下，但在程序结束时仍然存在

OutOfMemoryError

我应该怎么做来纠正这个问题