Scala 使用Spark ML计算PCA时出现IllegalArgumentException_Scala_Apache Spark

Scala 使用Spark ML计算PCA时出现IllegalArgumentException

scala apache-spark

Scala 使用Spark ML计算PCA时出现IllegalArgumentException,scala,apache-spark,Scala,Apache Spark,我有一个拼花地板文件，其中包含id和features列，我想应用pca算法 val dataset = spark.read.parquet("/usr/local/spark/dataset/data/user") val features = new VectorAssembler() .setInputCols(Array("id", "features" )) .setOutputCol("features") val pca = new PCA() .set

我有一个拼花地板文件，其中包含

id

和

features

列，我想应用pca算法

val dataset =  spark.read.parquet("/usr/local/spark/dataset/data/user")
val features = new VectorAssembler()
    .setInputCols(Array("id", "features" ))
    .setOutputCol("features")
val pca = new PCA()
     .setInputCol("features")
     .setK(50)
     .fit(dataset)
     .setOutputCol("pcaFeatures")
val result = pca.transform(dataset).select("pcaFeatures")
pca.save("/usr/local/spark/dataset/out")

但我有一个例外

java.lang.IllegalArgumentException:需求失败：列功能必须为org.apache.spark.ml.linalg类型。VectorUDT@3bfc3ba7但实际上是ArrayType（DoubleType，true）

Spark的PCA转换器需要一个由

矢量汇编程序创建的列。在这里，您创建了一个，但从不使用它。另外，矢量汇编程序
只接受数字作为输入。我不知道功能的类型是什么，但是如果是数组，它就不能工作。首先将其转换为数字列。最后，以与原始列相同的方式命名组合列是一个坏主意。事实上，VectorAssembler
不会删除输入列，如果有两个特性
列，您将结束
以下是Spark中PCA计算的工作示例：
import org.apache.spark.ml.feature_
val df=火花点火范围（10）
.选择（'id，（'id*'id）作为“id2”，选择（'id*'id*'id）作为“id3”）
val assembler=新向量汇编程序（）
.setInputCols（数组（“id”、“id2”、“id3”））.setOutputCol（“特性”）
val assembled_df=assembler.transform（df）
val pca=新pca（）
.setInputCol（“功能”）.setOutputCol（“pcaFeatures”）.setK（2）
.装配（装配式）
val结果=pca.transform（组合的_df）
您能给我们看一下数据集的结果吗？printSchema
请？