Python 3.x 如何在pyspark中命名kmeans集群
我有以下代码:Python 3.x 如何在pyspark中命名kmeans集群,python-3.x,apache-spark,pyspark,apache-spark-sql,apache-spark-ml,Python 3.x,Apache Spark,Pyspark,Apache Spark Sql,Apache Spark Ml,我有以下代码: %pyspark from pyspark.ml.linalg import Vectors from pyspark.ml.feature import VectorAssembler from pyspark.ml.clustering import KMeans from pyspark.ml import Pipeline (trainingData, testData) = dataFrame.randomSplit([0.7, 0.3]) assembler = Ve
%pyspark
from pyspark.ml.linalg import Vectors
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.clustering import KMeans
from pyspark.ml import Pipeline
(trainingData, testData) = dataFrame.randomSplit([0.7, 0.3])
assembler = VectorAssembler(inputCols = ["PetalLength", "PetalWidth", "SepalLength", "SepalWidth"], outputCol="features")
kmeans = KMeans().setK(3).setSeed(101010)
pipeline = Pipeline(stages=[assembler, kmeans])
modelKMeans = pipeline.fit(dataFrame)
当我运行这个时:
predictions = modelKMeans.transform(testData)
z.show(predictions)
我想在预测栏中看到“毛鸢尾”而不是0,“花色鸢尾”而不是1,“维吉尼亚鸢尾”而不是2。有可能吗?KMeans不是一种分类算法,它是一种聚类算法。因此,它不知道它制造的集群对应什么。如果您想要“Iris setosa”而不是0,则必须首先检查“Iris setosa”组是否对应于0。你不能事先做这件事。然后,您可以使用映射创建一个新列:
groups=when(预测==0,“Iris setosa”)\
.when(预测==1,“Iris versicolor”)\
.何时(预测=2,“维吉尼亚鸢尾”)\
。否则(无)