Python 在PySpark中向DenseVector添加值
我有一个数据帧,我已经处理成:Python 在PySpark中向DenseVector添加值,python,vector,pyspark,type-conversion,Python,Vector,Pyspark,Type Conversion,我有一个数据帧,我已经处理成: +---------+-------+ | inputs | temp | +---------+-------+ | [1,0,0] | 12 | +---------+-------+ | [0,1,0] | 10 | +---------+-------+ ... 输入是一列densevector温度是一列值。我想用这些值附加DenseVector并创建一列,但我不确定如何开始。有关此所需输出的任何提示: +---------------
+---------+-------+
| inputs | temp |
+---------+-------+
| [1,0,0] | 12 |
+---------+-------+
| [0,1,0] | 10 |
+---------+-------+
...
输入是一列densevector<代码>温度
是一列值。我想用这些值附加DenseVector并创建一列,但我不确定如何开始。有关此所需输出的任何提示:
+---------------+
| inputsMerged |
+---------------+
| [1,0,0,12] |
+---------------+
| [0,1,0,10] |
+---------------+
...
编辑:我试图使用
矢量汇编程序
方法,但生成的数组不符合预期。您可以执行以下操作:
df.show()
+-------------+----+
| inputs|temp|
+-------------+----+
|[1.0,0.0,0.0]| 12|
|[0.0,1.0,0.0]| 10|
+-------------+----+
df.printSchema()
root
|-- inputs: vector (nullable = true)
|-- temp: long (nullable = true)
进口:
创建udf以合并向量和元素:
concat = F.udf(lambda v, e: Vectors.dense(list(v) + [e]), VectorUDT())
将udf应用于输入和临时列:
您希望如何附加这些值?类似于
[1,0,0,12],[0,1,0,10]
?是的,为了清晰起见,我将在中编辑它,但我希望这样。
concat = F.udf(lambda v, e: Vectors.dense(list(v) + [e]), VectorUDT())
merged_df = df.select(concat(df.inputs, df.temp).alias('inputsMerged'))
merged_df.show()
+------------------+
| inputsMerged|
+------------------+
|[1.0,0.0,0.0,12.0]|
|[0.0,1.0,0.0,10.0]|
+------------------+
merged_df.printSchema()
root
|-- inputsMerged: vector (nullable = true)