Apache spark 如何将org.apache.spark.rdd.rdd[Array[Double]]转换为spark MLlib所需的Array[Double]
我正在尝试使用ApacheSpark实现Apache spark 如何将org.apache.spark.rdd.rdd[Array[Double]]转换为spark MLlib所需的Array[Double],apache-spark,apache-spark-mllib,Apache Spark,Apache Spark Mllib,我正在尝试使用ApacheSpark实现KMeans val data = sc.textFile(irisDatasetString) val parsedData = data.map(_.split(',').map(_.toDouble)).cache() val clusters = KMeans.train(parsedData,3,numIterations = 20) 对此,我得到以下错误: error: overloaded method value train with
KMeans
val data = sc.textFile(irisDatasetString)
val parsedData = data.map(_.split(',').map(_.toDouble)).cache()
val clusters = KMeans.train(parsedData,3,numIterations = 20)
对此,我得到以下错误:
error: overloaded method value train with alternatives:
(data: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector],k: Int,maxIterations: Int,runs: Int)org.apache.spark.mllib.clustering.KMeansModel <and>
(data: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector],k: Int,maxIterations: Int)org.apache.spark.mllib.clustering.KMeansModel <and>
(data: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector],k: Int,maxIterations: Int,runs: Int,initializationMode: String)org.apache.spark.mllib.clustering.KMeansModel
cannot be applied to (org.apache.spark.rdd.RDD[Array[Double]], Int, numIterations: Int)
val clusters = KMeans.train(parsedData,3,numIterations = 20)
error: type Vector takes type parameters
val vectorData: Vector = Vectors.dense(parsedData)
^
error: overloaded method value dense with alternatives:
(values: Array[Double])org.apache.spark.mllib.linalg.Vector <and>
(firstValue: Double,otherValues: Double*)org.apache.spark.mllib.linalg.Vector
cannot be applied to (org.apache.spark.rdd.RDD[Array[Double]])
val vectorData: Vector = Vectors.dense(parsedData)
在这一点上,我得到了以下错误:
error: overloaded method value train with alternatives:
(data: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector],k: Int,maxIterations: Int,runs: Int)org.apache.spark.mllib.clustering.KMeansModel <and>
(data: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector],k: Int,maxIterations: Int)org.apache.spark.mllib.clustering.KMeansModel <and>
(data: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector],k: Int,maxIterations: Int,runs: Int,initializationMode: String)org.apache.spark.mllib.clustering.KMeansModel
cannot be applied to (org.apache.spark.rdd.RDD[Array[Double]], Int, numIterations: Int)
val clusters = KMeans.train(parsedData,3,numIterations = 20)
error: type Vector takes type parameters
val vectorData: Vector = Vectors.dense(parsedData)
^
error: overloaded method value dense with alternatives:
(values: Array[Double])org.apache.spark.mllib.linalg.Vector <and>
(firstValue: Double,otherValues: Double*)org.apache.spark.mllib.linalg.Vector
cannot be applied to (org.apache.spark.rdd.RDD[Array[Double]])
val vectorData: Vector = Vectors.dense(parsedData)
错误:类型向量采用类型参数
val vectorData:Vector=Vectors.densite(解析数据)
^
错误:重载方法值,并包含多个备选方案:
(值:Array[Double])org.apache.spark.mllib.linalg.Vector
(firstValue:Double,othervalue:Double*)org.apache.spark.mllib.linalg.Vector
无法应用于(org.apache.spark.rdd.rdd[Array[Double]])
val vectorData:Vector=Vectors.densite(解析数据)
因此我推断,org.apache.spark.rdd.rdd[Array[Double]]
与Array[Double]不同
我如何继续使用我的数据作为org.apache.spark.rdd.rdd[Array[Double]]?或者如何将org.apache.spark.rdd.rdd[Array[Double]]转换为Array[Double]?
KMeans.train
期望的是rdd[Vector]
,而不是rdd[Array[Double]
。在我看来,你所需要做的就是改变
val parsedData = data.map(_.split(',').map(_.toDouble)).cache()
到
不,那不行。我现在得到了以下错误:错误:扩展函数((x$1)=>x$1.split(',').map(((x$2)=>x$2.toDouble)))val parsedData=data.map(Vectors.densite(.split(',')).map(.toDouble))。cache()我也尝试过了。因此,我得到类型为:
org.apache.spark.rdd.rdd[org.apache.spark.mllib.linalg.Vector]
的解析数据,然后我尝试使用:val-dataArray=parsedData.collect val-dataVector=Vectors.dense(dataArray)
将其转换为向量,因为我的数据数组是Array[org.apache.spark.mllib.linalg.Vector]
和Vector.dense
需要一个数组[Double]
为什么希望RDD[Vector]
是一个向量KMeans.train
需要一个RDD[Vector]
。你是对的:)出于某种原因,我认为我必须收集数据,然后将其传递给k means。你的解决方案有效:)谢谢。嘿,爬山,你怎么用pyspark写同样的东西?我正在尝试获取CSV文件中数据的多变量统计信息。函数需要RDD[Vectors]。我不知道怎么弄到它们