Scala 使用定义的函数Spark 2.4？_Scala_Apache Spark_Apache Spark Ml_Scala 2.12

Scala 使用定义的函数Spark 2.4？

scala apache-spark

Scala 使用定义的函数Spark 2.4？,scala,apache-spark,apache-spark-ml,scala-2.12,Scala,Apache Spark,Apache Spark Ml,Scala 2.12,我正在运行一个kmeans算法，我创建了一个VectorAssembler，将inputcols设置为（“经度”、“纬度”）并将outputCol设置为（“位置”）。我需要将数据从json文件集群到3集群。我根据经度和纬度对数据进行分类，并创建向量位置来连接两者。位置和纬度是双重类型。我想这是因为位置向量我得到的错误如下： 19/04/08 15:20:56 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) org.a

我正在运行一个kmeans算法，我创建了一个

VectorAssembler

，将

inputcols

设置为（“经度”、“纬度”）并将

outputCol

设置为（“位置”）。我需要将数据从json文件集群到3集群。我根据经度和纬度对数据进行分类，并创建向量位置来连接两者。位置和纬度是双重类型。我想这是因为位置向量我得到的错误如下：

19/04/08 15:20:56 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
org.apache.spark.SparkException: Failed to execute user defined function(VectorAssembler$$Lambda$1629/684426930: (struct<latitude:double,longitude:double>) => struct<type:tinyint,size:int,indices:array<int>,values:array<double>>)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)

这是模式

由于所有功能都有

nullable=true

，如果有任何空值，VectorAssembler将抛出一个错误。尝试将

handleInvalid

设置为

“跳过”

。这将过滤掉所有为空的行

val stationVA = new VectorAssembler().
                     setInputCols(Array("latitude","longitude")).
                     setOutputCol("location").
                     setHandleInvalid("skip")

由于所有功能都有

nullable=true

，如果有任何空值，VectorAssembler将抛出一个错误。尝试将

handleInvalid

设置为

“跳过”

。这将过滤掉所有为空的行

val stationVA = new VectorAssembler().
                     setInputCols(Array("latitude","longitude")).
                     setOutputCol("location").
                     setHandleInvalid("skip")

val stationVA = new VectorAssembler().
                     setInputCols(Array("latitude","longitude")).
                     setOutputCol("location").
                     setHandleInvalid("skip")