Scala 在SPARK中操作RDD,每3行合并为一行

Scala 在SPARK中操作RDD,每3行合并为一行,scala,apache-spark,rdd,Scala,Apache Spark,Rdd,我现在有一份数据副本,每行的数据如下 A B C QW OO P ... 现在我要合并每三行行为,如下所示: ABC QWOOP ... 我应该为这个函数编写什么代码 eg. val data = sc.textFile("path") 谢谢 val lineRdd = sc.textFile("path") val yourRequiredRdd = lineRdd .zipWithIndex .map({ case (line, index) => (index / 3

我现在有一份数据副本,每行的数据如下

A
B
C
QW
OO
P
...
现在我要合并每三行行为,如下所示:

ABC
QWOOP
...
我应该为这个函数编写什么代码

eg. val data = sc.textFile("path")
谢谢

val lineRdd = sc.textFile("path")

val yourRequiredRdd = lineRdd
  .zipWithIndex
  .map({ case (line, index) => (index / 3, (index, line)))
  .aggregateByKey(List.empty[(Long, String)])(
    { case (aggrList, (index, line)) => (index, line) :: aggrList },
    { case (aggrList1, aggrList2) => aggrList1 ++ aggrList2 }
  )
  .map({ case (key, aggrList) =>
    aggrList
      .sortBy({ case (index, line) => index })
      .map({ case (index, line) => line })
      .mkString("")
  })