Apache spark 如何在数据帧中合并多个特征向量？_Apache Spark_Machine Learning_Apache Spark Sql_Apache Spark Ml

Apache spark 如何在数据帧中合并多个特征向量？

apache-spark machine-learning

Apache spark 如何在数据帧中合并多个特征向量？,apache-spark,machine-learning,apache-spark-sql,apache-spark-ml,Apache Spark,Machine Learning,Apache Spark Sql,Apache Spark Ml,使用Spark ML transformers，我得到了一个数据帧，其中每一行如下所示： Row(object_id, text_features_vector, color_features, type_features) 其中，text\u features是术语权重的稀疏向量，color\u features是一个小的20元素（一个热编码器）的颜色密集向量，type\u features也是一个热编码器的类型密集向量（使用Spark的设备）将这些功能合并到一个单一的大型阵列中，这样我就

使用Spark ML transformers，我得到了一个

数据帧

，其中每一行如下所示：

Row(object_id, text_features_vector, color_features, type_features)

其中，

text\u features

是术语权重的稀疏向量，

color\u features

是一个小的20元素（一个热编码器）的颜色密集向量，

type\u features

也是一个热编码器的类型密集向量

（使用Spark的设备）将这些功能合并到一个单一的大型阵列中，这样我就可以测量任意两个对象之间的余弦距离了，这是一个好方法吗？

您可以使用：

import org.apache.spark.ml.feature.VectorAssembler
导入org.apache.spark.sql.DataFrame
val df:DataFrame=？？？
val assembler=新向量汇编程序（）
.setInputCols（数组（“文本特征”、“颜色特征”、“类型特征”））
.setOutputCol（“特性”）
val transformed=汇编器转换（df）

有关PySpark示例，请参见：