Spark中的K-means（Scala）-当模型由标准化数据生成时，如何将集群编号映射回客户ID_Scala_Hadoop_Apache Spark_K Means

Spark中的K-means（Scala）-当模型由标准化数据生成时，如何将集群编号映射回客户ID

scala hadoop apache-spark

Spark中的K-means（Scala）-当模型由标准化数据生成时，如何将集群编号映射回客户ID,scala,hadoop,apache-spark,k-means,Scala,Hadoop,Apache Spark,K Means,下面的代码用于获取模型。我面临的问题是将集群编号映射回客户ID。这是因为，我的模型是在标准化数据上训练的，但是具有客户ID的数据具有未标准化的数据。我想不出如何映射回去 import org.apache.spark.SparkContext._ import org.apache.spark.mllib.clustering.{KMeans, KMeansModel} import org.apache.spark.mllib.linalg.Vectors import scala.colle

下面的代码用于获取模型。我面临的问题是将集群编号映射回客户ID。这是因为，我的模型是在标准化数据上训练的，但是具有客户ID的数据具有未标准化的数据。我想不出如何映射回去

import org.apache.spark.SparkContext._
import org.apache.spark.mllib.clustering.{KMeans, KMeansModel}
import org.apache.spark.mllib.linalg.Vectors
import scala.collection.mutable.ArrayBuffer
import org.apache.spark.mllib.feature.StandardScaler
import org.apache.spark.mllib.util.MLUtils
// importing the data for clustering
val data = sc.textFile("hdfs://path/data_for_clus1") 
val vectors = data.map(s => s.split('\1')).map(s => s.slice(1, s.size)) 
val parsedData =  vectors.map(s => Vectors.dense(s.map(_.toDouble)))    

val dataAsArray = parsedData.map(_.toArray)  
// Using Standardscaler to standardize data
val features = dataAsArray.map(a => Vectors.dense(a))
val scaler = new StandardScaler(withMean = true, withStd = true).fit(features) 
val scaledFeatures = scaler.transform(features) 


val WSSEBuffer = ArrayBuffer[Double](); 
// K-means
val numClusters = 20
val numIterations = 500
val clusters = KMeans.train(scaledFeatures, numClusters, numIterations)
val WSSSE = clusters.computeCost(scaledFeatures)

使用“集群”模型，我想为“数据”表中的客户ID提供集群编号。

将数据解析为

val newdata = Array[(customerID, featureArray)]

然后

不确定这是否是一种有效的方法

将数据解析为

val newdata = Array[(customerID, featureArray)]

然后

不确定这是否是一种有效的方法