Pyspark-WARN bisetingkmeans：输入RDD不是直接缓存的_Pyspark_Spark Dataframe_Apache Spark Mllib_Apache Spark Ml

Pyspark-WARN bisetingkmeans：输入RDD不是直接缓存的

pyspark

Pyspark-WARN bisetingkmeans：输入RDD不是直接缓存的,pyspark,spark-dataframe,apache-spark-mllib,apache-spark-ml,Pyspark,Spark Dataframe,Apache Spark Mllib,Apache Spark Ml,我把kmeans一分为二 bkm_test=BisectingKMeans().setK(5).setSeed(1) rdf.cache() assembled.cache() model_test=bkm_test.fit(assembled) 我缓存了这两个数据帧，因为我不断得到错误，但这没有什么区别，我发现这与kmeans相似。但我也得到了下面的警告执行器错误。这只是算法中我无法修复的部分吗 17/08/14 21:53:17 WARN BisectingKMeans: The in

我把kmeans一分为二

bkm_test=BisectingKMeans().setK(5).setSeed(1)

rdf.cache()
assembled.cache()
model_test=bkm_test.fit(assembled)

我缓存了这两个数据帧，因为我不断得到错误，但这没有什么区别，我发现这与kmeans相似。但我也得到了下面的警告执行器错误。这只是算法中我无法修复的部分吗

17/08/14 21:53:17 WARN BisectingKMeans: The input RDD 306 is not directly cached, which may hurt performance if its parent RDDs are also not cached.
17/08/14 21:53:17 WARN Executor: 1 block locks were not released by TID = 132:
[rdd_302_0]

这是从哪来的。MLlib使用向量的RDD，而Spark ML是面向数据帧的，所以ML版本的二分法Kmeans。转换未缓存，因此最终会出现错误

希望这不是一次大的减速。我还没有找到一种简单的方法来强制缓存转换后的RDD