Scala 将一条记录转换为多条记录
如果输入的格式为Scala 将一条记录转换为多条记录,scala,apache-spark,Scala,Apache Spark,如果输入的格式为 (x1,(a,b,c,List(key1, key2)) (x2,(a,b,c,List(key3)) 我想实现这个输出 (key1,(a,b,c,x1)) (key2,(a,b,c,x1)) (key3,(a,b,c,x2)) 代码如下: var hashtags = joined_d.map(x => (x._1, (x._2._1._1, x._2._2, x._2._1._4, getHashTags(x._2._1._4)))) var hashtags
(x1,(a,b,c,List(key1, key2))
(x2,(a,b,c,List(key3))
我想实现这个输出
(key1,(a,b,c,x1))
(key2,(a,b,c,x1))
(key3,(a,b,c,x2))
代码如下:
var hashtags = joined_d.map(x => (x._1, (x._2._1._1, x._2._2, x._2._1._4, getHashTags(x._2._1._4))))
var hashtags_keys = hashtags.map(x => if(x._2._4.size == 0) (x._1, (x._2._1, x._2._2, x._2._3, 0)) else
x._2._4.map(y => (y, (x._2._1, x._2._2, x._2._3, 1))))
函数getHashTags()返回一个列表。如果列表不是空的,我们希望使用列表中的每个元素作为新键。我应该如何解决这个问题?将
rdd
创建为:
val rdd = sc.parallelize(
Seq(
("x1",("a","b","c",List("key1", "key2"))),
("x2", ("a", "b", "c", List("key3")))
)
)
您可以像这样使用flatMap
:
rdd.flatMap{ case (x, (a, b, c, list)) => list.map(k => (k, (a, b, c, x))) }.collect
// res12: Array[(String, (String, String, String, String))] =
// Array((key1,(a,b,c,x1)),
// (key2,(a,b,c,x1)),
// (key3,(a,b,c,x2)))
使用创建为以下内容的
rdd
:
val rdd = sc.parallelize(
Seq(
("x1",("a","b","c",List("key1", "key2"))),
("x2", ("a", "b", "c", List("key3")))
)
)
您可以像这样使用flatMap
:
rdd.flatMap{ case (x, (a, b, c, list)) => list.map(k => (k, (a, b, c, x))) }.collect
// res12: Array[(String, (String, String, String, String))] =
// Array((key1,(a,b,c,x1)),
// (key2,(a,b,c,x1)),
// (key3,(a,b,c,x2)))
这里有一种方法:
val rdd = sc.parallelize(Seq(
("x1", ("a", "b", "c", List("key1", "key2"))),
("x2", ("a", "b", "c", List("key3")))
))
val rdd2 = rdd.flatMap{
case (x, (a, b, c, l)) => l.map( (_, (a, b, c, x) ) )
}
rdd2.collect
// res1: Array[(String, (String, String, String, String))] = Array((key1,(a,b,c,x1)), (key2,(a,b,c,x1)), (key3,(a,b,c,x2)))
这里有一种方法:
val rdd = sc.parallelize(Seq(
("x1", ("a", "b", "c", List("key1", "key2"))),
("x2", ("a", "b", "c", List("key3")))
))
val rdd2 = rdd.flatMap{
case (x, (a, b, c, l)) => l.map( (_, (a, b, c, x) ) )
}
rdd2.collect
// res1: Array[(String, (String, String, String, String))] = Array((key1,(a,b,c,x1)), (key2,(a,b,c,x1)), (key3,(a,b,c,x2)))
尝试使用
flatMap
代替贴图转换。尝试使用flatMap
代替贴图转换。