Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/323.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Java将Spark数据帧中的数组转换为DenseVector_Java_Apache Spark_Dataframe_Apache Spark Sql_User Defined Functions - Fatal编程技术网

使用Java将Spark数据帧中的数组转换为DenseVector

使用Java将Spark数据帧中的数组转换为DenseVector,java,apache-spark,dataframe,apache-spark-sql,user-defined-functions,Java,Apache Spark,Dataframe,Apache Spark Sql,User Defined Functions,我正在运行Spark 2.3。我想将以下数据帧中的列功能从ArrayType转换为DenseVector。我正在Java中使用Spark +---+--------------------+ | id| features| +---+--------------------+ | 0|[4.191401, -1.793...| | 10|[-0.5674514, -1.3...| | 20|[0.735613, -0.026...| | 30|[-0.030161237,

我正在运行Spark 2.3。我想将以下数据帧中的列
功能
ArrayType
转换为
DenseVector
。我正在Java中使用Spark

+---+--------------------+
| id|            features|
+---+--------------------+
|  0|[4.191401, -1.793...|
| 10|[-0.5674514, -1.3...|
| 20|[0.735613, -0.026...|
| 30|[-0.030161237, 0....|
| 40|[-0.038345724, -0...|
+---+--------------------+

root
 |-- id: integer (nullable = false)
 |-- features: array (nullable = true)
 |    |-- element: float (containsNull = false)
我已经编写了以下
UDF
,但它似乎不起作用:

private static UDF1 toVector = new UDF1<Float[], Vector>() {

    private static final long serialVersionUID = 1L;

    @Override
    public Vector call(Float[] t1) throws Exception {

        double[] DoubleArray = new double[t1.length];
        for (int i = 0 ; i < t1.length; i++)
        {
            DoubleArray[i] = (double) t1[i];
        }   
    Vector vector = (org.apache.spark.mllib.linalg.Vector) Vectors.dense(DoubleArray);
    return vector;
    }
}
运行此代码段时,我面临以下错误:

ReadProcessData$1不能强制转换为org.apache.spark.sql.expressions。用户定义聚合函数


问题在于如何在Spark中注册
udf
。您不应该使用
UserDefinedAggregateFunction
,它不是用于聚合的
udf
,而是
udaf
。相反,你应该做的是:

spark.udf().register("toVector", toVector, new VectorUDT());
然后,要使用注册函数,请使用:

df3.withColumn("featuresnew", callUDF("toVector",df3.col("feautres")));
udf
本身应进行如下微调:

spark.udf().register("toVector", (UserDefinedAggregateFunction) toVector);
df3 = df3.withColumn("featuresnew", callUDF("toVector", df3.col("feautres")));
df3.show();  
UDF1 toVector = new UDF1<Seq<Float>, Vector>(){

  public Vector call(Seq<Float> t1) throws Exception {

    List<Float> L = scala.collection.JavaConversions.seqAsJavaList(t1);
    double[] DoubleArray = new double[t1.length()]; 
    for (int i = 0 ; i < L.size(); i++) { 
      DoubleArray[i]=L.get(i); 
    } 
    return Vectors.dense(DoubleArray); 
  } 
};

问题在于如何在Spark中注册
udf
。您不应该使用
UserDefinedAggregateFunction
,它不是用于聚合的
udf
,而是
udaf
。相反,你应该做的是:

spark.udf().register("toVector", toVector, new VectorUDT());
然后,要使用注册函数,请使用:

df3.withColumn("featuresnew", callUDF("toVector",df3.col("feautres")));
udf
本身应进行如下微调:

spark.udf().register("toVector", (UserDefinedAggregateFunction) toVector);
df3 = df3.withColumn("featuresnew", callUDF("toVector", df3.col("feautres")));
df3.show();  
UDF1 toVector = new UDF1<Seq<Float>, Vector>(){

  public Vector call(Seq<Float> t1) throws Exception {

    List<Float> L = scala.collection.JavaConversions.seqAsJavaList(t1);
    double[] DoubleArray = new double[t1.length()]; 
    for (int i = 0 ; i < L.size(); i++) { 
      DoubleArray[i]=L.get(i); 
    } 
    return Vectors.dense(DoubleArray); 
  } 
};

@b工程师:对于Spark中的机器学习,向量(
DenseVector
SparseVector
)用于输入,而不是数组。可能还有其他用例。@b工程师:对于Spark中的机器学习,向量(
DenseVector
SparseVector
)用于输入,而不是数组。还可能有其他用例。