Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
我可以从ApacheSpark UDF(java)返回一个Tuple2吗?_Java_Apache Spark_Apache Spark Sql_User Defined Functions - Fatal编程技术网

我可以从ApacheSpark UDF(java)返回一个Tuple2吗?

我可以从ApacheSpark UDF(java)返回一个Tuple2吗?,java,apache-spark,apache-spark-sql,user-defined-functions,Java,Apache Spark,Apache Spark Sql,User Defined Functions,我需要一个UDF2,它接受两个参数作为输入,对应于String和mllib.linalg.Vector类型的两个Dataframe列,并返回一个Tuple2。这可行吗?如果是,如何注册此udf() 自定义项定义如下: UDF2<String, org.apache.spark.mllib.linalg.Vector, Tuple2<String, org.apache.spark.mllib.linalg.Vector>> get_item_data =

我需要一个UDF2,它接受两个参数作为输入,对应于String和mllib.linalg.Vector类型的两个Dataframe列,并返回一个Tuple2。这可行吗?如果是,如何注册此udf()

自定义项定义如下:

UDF2<String, org.apache.spark.mllib.linalg.Vector, Tuple2<String, org.apache.spark.mllib.linalg.Vector>> get_item_data =
            (String id, org.apache.spark.mllib.linalg.Vector features) -> {
        return new Tuple2<>(id, features);
    };
import org.apache.spark.sql.types.DataType;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.mllib.linalg.VectorUDT;

List<StructField> fields = new ArrayList<>();
fields.add(DataTypes.createStructField("id", DataTypes.StringType, false));
fields.add(DataTypes.createStructField("features", new VectorUDT(), false));
DataType schema = DataTypes.createStructType(fields);
UDF2获取项目数据=
(字符串id,org.apache.spark.mllib.linalg.Vector features)->{
返回新的Tuple2(id,features);
};

有一个
模式
,可定义如下:

UDF2<String, org.apache.spark.mllib.linalg.Vector, Tuple2<String, org.apache.spark.mllib.linalg.Vector>> get_item_data =
            (String id, org.apache.spark.mllib.linalg.Vector features) -> {
        return new Tuple2<>(id, features);
    };
import org.apache.spark.sql.types.DataType;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.mllib.linalg.VectorUDT;

List<StructField> fields = new ArrayList<>();
fields.add(DataTypes.createStructField("id", DataTypes.StringType, false));
fields.add(DataTypes.createStructField("features", new VectorUDT(), false));
DataType schema = DataTypes.createStructType(fields);