Apache spark Spark:频繁模式挖掘:保存结果中的问题

Apache spark Spark:频繁模式挖掘:保存结果中的问题,apache-spark,apache-spark-mllib,Apache Spark,Apache Spark Mllib,我正在使用Spark的FP增长算法。我在收集时遇到OOM错误,然后我更改了代码,以便将结果保存在HDFS上的文本文件中,而不是在驱动程序节点上收集。以下是相关代码: //模型建筑: val fpg = new FPGrowth() .setMinSupport(0.01) .setNumPartitions(10) val model = fpg.run(transaction_distinct) 这里有一个转换,它应该给我RDD[Strings] val mymodel = mode

我正在使用Spark的FP增长算法。我在收集时遇到OOM错误,然后我更改了代码,以便将结果保存在HDFS上的文本文件中,而不是在驱动程序节点上收集。以下是相关代码:

//模型建筑:

val fpg = new FPGrowth()
  .setMinSupport(0.01)
  .setNumPartitions(10)
val model = fpg.run(transaction_distinct)
这里有一个转换,它应该给我RDD[Strings]

val mymodel = model.freqItemsets.map { itemset =>
  val model_res = itemset.items.mkString("[", ",", "]") + ", " + itemset.freq
  model_res
}
然后将模型结果另存为。不幸的是,这真的很慢

mymodel.saveAsTextFile("fpm_model")
我发现以下错误:

16/02/04 14:47:28 ERROR ErrorMonitor: AssociationError[akka.tcp://sparkDriver@ipaddress:46811] -> [akka.tcp://sparkExecutor@hostname:39720]: Error [Association failed with [akka.tcp://sparkExecutor@hostname:39720]][akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@hostname:39720]

Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: hostname/ipaddress:39720] akka.event.Logging$Error$NoCause$
16/02/04 14:47:28 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(3, hostname, 58683)
16/02/04 14:47:28 INFO BlockManagerMaster: Removed 3 successfully in removeExecutor
16/02/04 14:47:28 ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver@ipaddress:46811] ->[akka.tcp://sparkExecutor@hostname:39720]: Error [Association failed with [akka.tcp://sparkExecutor@hostname:39720]][akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@hostname:39720]

Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: hostname/ipaddress:39720

调用
saveAsTextFile
时,
mymodel
中有多少个分区?我在saveAsTextFile之前明确尝试了以下方法:mymodel.repartition(400),但没有帮助。您可能想看看答案。当您调用
saveAsTextFile
时,
mymodel
中有多少个分区?我在saveAsTextFile之前明确尝试过:mymodel.repartition(400),但没有帮助。您可能想看看答案。