Apache spark Spark将列组合为嵌套数组

Apache spark Spark将列组合为嵌套数组,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,如何将spark中的列合并为嵌套数组 val inputSmall = Seq( ("A", 0.3, "B", 0.25), ("A", 0.3, "g", 0.4), ("d", 0.0, "f", 0.1), ("d", 0.0, "d", 0.7), ("A", 0.3, "d", 0.7), ("d", 0.0, "g", 0.4), ("c", 0.2, "B", 0.25)).toDF("column1", "transfor

如何将spark中的列合并为嵌套数组

val inputSmall = Seq(
    ("A", 0.3, "B", 0.25),
    ("A", 0.3, "g", 0.4),
    ("d", 0.0, "f", 0.1),
    ("d", 0.0, "d", 0.7),
    ("A", 0.3, "d", 0.7),
    ("d", 0.0, "g", 0.4),
    ("c", 0.2, "B", 0.25)).toDF("column1", "transformedCol1", "column2", "transformedCol2")
类似于

+-------+---------------+---------------+------- +
|column1|transformedCol1|transformedCol2|combined|
+-------+---------------+---------------+------ -+
|      A|            0.3|            0.3[0.3, 0.3]|
+-------+---------------+---------------+-------+

如果要将多个列合并为ArrayType的新列,可以使用以下函数:

import org.apache.spark.sql.functions_
val结果=inputSmall.withColumn(“组合”,数组($“transformedCol1”,“transformedCol2”))
result.show()
+-------+---------------+-------+---------------+-----------+
|列1 |转换列1 |列2 |转换列2 |组合|
+-------+---------------+-------+---------------+-----------+
|A | 0.3 | B | 0.25 |[0.3,0.25]|
|A | 0.3 | g | 0.4 |[0.3,0.4]|
|d | 0.0 | f | 0.1 |[0.0,0.1]|
|d | 0.0 | d | 0.7 |[0.0,0.7]|
|A | 0.3 | d | 0.7 |[0.3,0.7]|
|d | 0.0 | g | 0.4 |[0.0,0.4]|
|c | 0.2 | B | 0.25 |[0.2,0.25]|
+-------+---------------+-------+---------------+-----------+

如果要组合一组列,该怎么办
val names=Seq(“foo”、“bar”)
后跟
带列(“组合”,数组(名称:*)
不受支持。这意味着,如果名称动态更改,似乎无法实现这一点。Aha,它将接受多个列,而不是多个字符串,因此这是可行的:
val names=Seq(“foo”,“bar”);frame.withColumn(“组合”,数组(names.map(frame()):*)