Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 展平Spark数据框和名称列_Scala_Apache Spark_Dataframe - Fatal编程技术网

Scala 展平Spark数据框和名称列

Scala 展平Spark数据框和名称列,scala,apache-spark,dataframe,Scala,Apache Spark,Dataframe,如何在spark数据帧中取消对数组的嵌套,使生成的数据帧包含原始数组中每个值的一行 例如: scala> df.show() +---------+------+ |employees|person| +---------+------+ |[1, 2, 3]| Mary| |[4, 5, 6]| John| +---------+------+ 预期结果: +---------+------+ |employee |person| +---------+------+ |1

如何在spark数据帧中取消对数组的嵌套,使生成的数据帧包含原始数组中每个值的一行

例如:

scala> df.show()
+---------+------+
|employees|person|
+---------+------+
|[1, 2, 3]|  Mary|
|[4, 5, 6]|  John|
+---------+------+
预期结果:

+---------+------+
|employee |person|
+---------+------+
|1        |  Mary|
|2        |  Mary|
|3        |  Mary|
|4        |  John|
|5        |  John|
|6        |  John|
+---------+------+
这就是我尝试过的:

df.select($"person", explode($"employees")).show()

+------+---+
|person|col|
+------+---+
|  Mary|  1|
|  Mary|  2|
|  Mary|  3|
|  John|  4|
|  John|  5|
|  John|  6|
+------+---+
如何将分解后的列命名为“employee”

如何将分解后的列命名为“employee”


您可以使用
withColumn
as创建一个新的列作为

df.withColumn("employee", explode($"employees")).show()
df.select($"person", explode($"employees").as("employee")).show()
df.withColumn("employee", explode($"employees")).show()