Scala Flatmap和rdd,同时保留条目的其余部分
我在spark工作,我有一个Rdd的形式:Scala Flatmap和rdd,同时保留条目的其余部分,scala,apache-spark,Scala,Apache Spark,我在spark工作,我有一个Rdd的形式: (x_{11},x_{12}, x_{13}, Array(A_{1},A_{2},A_{3})) (x_{21},x_{22}, x_{23}, Array(A_{1},A_{2})) (x_{31},x_{32}, x_{33}, Array(A_{1})) 我希望在保持x值的同时展平数组值。我知道如果我只有数组,我可以做df.flatmap,每行得到一个数组元素,但我想做的是得到 (x_{11},x_{12}, x_{13}, A_{1}) (
(x_{11},x_{12}, x_{13}, Array(A_{1},A_{2},A_{3}))
(x_{21},x_{22}, x_{23}, Array(A_{1},A_{2}))
(x_{31},x_{32}, x_{33}, Array(A_{1}))
我希望在保持x值的同时展平数组值。我知道如果我只有数组,我可以做df.flatmap,每行得到一个数组元素,但我想做的是得到
(x_{11},x_{12}, x_{13}, A_{1})
(x_{11},x_{12}, x_{13}, A_{2})
(x_{11},x_{12}, x_{13}, A_{3})
(x_{21},x_{22}, x_{23}, A_{1})
(x_{21},x_{22}, x_{23}, A_{2})
(x_{31},x_{32}, x_{33}, A_{1})
基本上,我想要的是为数组中的每个项重复该行。如何在Spark Scala中执行此操作?您可以使用
flatMap
,只需确保传递的函数为列表中的所有值保留“前缀”列:
val input: RDD[(Int, Int, Int, Seq[String])] = sc.parallelize(Seq(
(1, 2, 3, Seq("a", "b")),
(5, 6, 7, Seq("c", "d", "e"))
))
val result: RDD[(Int, Int, Int, String)] =
input.flatMap { case (i1, i2, i3, list) => list.map(e => (i1, i2, i3, e)) }
/* result:
(1,2,3,a)
(1,2,3,b)
(5,6,7,c)
(5,6,7,d)
(5,6,7,e)
*/
您可以使用
flatMap
,只需确保您传递的函数为列表中的所有值保留“prefix”列:
val input: RDD[(Int, Int, Int, Seq[String])] = sc.parallelize(Seq(
(1, 2, 3, Seq("a", "b")),
(5, 6, 7, Seq("c", "d", "e"))
))
val result: RDD[(Int, Int, Int, String)] =
input.flatMap { case (i1, i2, i3, list) => list.map(e => (i1, i2, i3, e)) }
/* result:
(1,2,3,a)
(1,2,3,b)
(5,6,7,c)
(5,6,7,d)
(5,6,7,e)
*/