Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/google-cloud-platform/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 将一条记录转换为多条记录_Scala_Apache Spark - Fatal编程技术网

Scala 将一条记录转换为多条记录

Scala 将一条记录转换为多条记录,scala,apache-spark,Scala,Apache Spark,如果输入的格式为 (x1,(a,b,c,List(key1, key2)) (x2,(a,b,c,List(key3)) 我想实现这个输出 (key1,(a,b,c,x1)) (key2,(a,b,c,x1)) (key3,(a,b,c,x2)) 代码如下: var hashtags = joined_d.map(x => (x._1, (x._2._1._1, x._2._2, x._2._1._4, getHashTags(x._2._1._4)))) var hashtags

如果输入的格式为

(x1,(a,b,c,List(key1, key2))
(x2,(a,b,c,List(key3))
我想实现这个输出

(key1,(a,b,c,x1))
(key2,(a,b,c,x1))
(key3,(a,b,c,x2))

代码如下:

var hashtags = joined_d.map(x => (x._1, (x._2._1._1, x._2._2, x._2._1._4, getHashTags(x._2._1._4))))

var hashtags_keys = hashtags.map(x => if(x._2._4.size == 0) (x._1, (x._2._1, x._2._2, x._2._3, 0)) else
x._2._4.map(y => (y, (x._2._1, x._2._2, x._2._3, 1))))

函数getHashTags()返回一个列表。如果列表不是空的,我们希望使用列表中的每个元素作为新键。我应该如何解决这个问题?

rdd
创建为:

val rdd = sc.parallelize(
    Seq(
        ("x1",("a","b","c",List("key1", "key2"))), 
        ("x2", ("a", "b", "c", List("key3")))
    )
)
您可以像这样使用
flatMap

rdd.flatMap{ case (x, (a, b, c, list)) => list.map(k => (k, (a, b, c, x))) }.collect
// res12: Array[(String, (String, String, String, String))] = 
//        Array((key1,(a,b,c,x1)), 
//              (key2,(a,b,c,x1)), 
//              (key3,(a,b,c,x2)))

使用创建为以下内容的
rdd

val rdd = sc.parallelize(
    Seq(
        ("x1",("a","b","c",List("key1", "key2"))), 
        ("x2", ("a", "b", "c", List("key3")))
    )
)
您可以像这样使用
flatMap

rdd.flatMap{ case (x, (a, b, c, list)) => list.map(k => (k, (a, b, c, x))) }.collect
// res12: Array[(String, (String, String, String, String))] = 
//        Array((key1,(a,b,c,x1)), 
//              (key2,(a,b,c,x1)), 
//              (key3,(a,b,c,x2)))

这里有一种方法:

val rdd = sc.parallelize(Seq(
  ("x1", ("a", "b", "c", List("key1", "key2"))),
  ("x2", ("a", "b", "c", List("key3")))
))

val rdd2 = rdd.flatMap{
  case (x, (a, b, c, l)) => l.map( (_, (a, b, c, x) ) )
}

rdd2.collect
// res1: Array[(String, (String, String, String, String))] = Array((key1,(a,b,c,x1)), (key2,(a,b,c,x1)), (key3,(a,b,c,x2)))

这里有一种方法:

val rdd = sc.parallelize(Seq(
  ("x1", ("a", "b", "c", List("key1", "key2"))),
  ("x2", ("a", "b", "c", List("key3")))
))

val rdd2 = rdd.flatMap{
  case (x, (a, b, c, l)) => l.map( (_, (a, b, c, x) ) )
}

rdd2.collect
// res1: Array[(String, (String, String, String, String))] = Array((key1,(a,b,c,x1)), (key2,(a,b,c,x1)), (key3,(a,b,c,x2)))

尝试使用
flatMap
代替贴图转换。尝试使用
flatMap
代替贴图转换。