Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 如何在pyspark中命名kmeans集群_Python 3.x_Apache Spark_Pyspark_Apache Spark Sql_Apache Spark Ml - Fatal编程技术网

Python 3.x 如何在pyspark中命名kmeans集群

Python 3.x 如何在pyspark中命名kmeans集群,python-3.x,apache-spark,pyspark,apache-spark-sql,apache-spark-ml,Python 3.x,Apache Spark,Pyspark,Apache Spark Sql,Apache Spark Ml,我有以下代码: %pyspark from pyspark.ml.linalg import Vectors from pyspark.ml.feature import VectorAssembler from pyspark.ml.clustering import KMeans from pyspark.ml import Pipeline (trainingData, testData) = dataFrame.randomSplit([0.7, 0.3]) assembler = Ve

我有以下代码:

%pyspark
from pyspark.ml.linalg import Vectors
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.clustering import KMeans
from pyspark.ml import Pipeline
(trainingData, testData) = dataFrame.randomSplit([0.7, 0.3])
assembler = VectorAssembler(inputCols = ["PetalLength", "PetalWidth", "SepalLength", "SepalWidth"], outputCol="features")
kmeans = KMeans().setK(3).setSeed(101010)
pipeline = Pipeline(stages=[assembler, kmeans])
modelKMeans = pipeline.fit(dataFrame)
当我运行这个时:

predictions = modelKMeans.transform(testData)
z.show(predictions)

我想在预测栏中看到“毛鸢尾”而不是0,“花色鸢尾”而不是1,“维吉尼亚鸢尾”而不是2。有可能吗?

KMeans不是一种分类算法,它是一种聚类算法。因此,它不知道它制造的集群对应什么。如果您想要“Iris setosa”而不是0,则必须首先检查“Iris setosa”组是否对应于0。你不能事先做这件事。然后,您可以使用映射创建一个新列:

groups=when(预测==0,“Iris setosa”)\
.when(预测==1,“Iris versicolor”)\
.何时(预测=2,“维吉尼亚鸢尾”)\
。否则(无)