Scala 在SPARK中操作RDD,每3行合并为一行
我现在有一份数据副本,每行的数据如下Scala 在SPARK中操作RDD,每3行合并为一行,scala,apache-spark,rdd,Scala,Apache Spark,Rdd,我现在有一份数据副本,每行的数据如下 A B C QW OO P ... 现在我要合并每三行行为,如下所示: ABC QWOOP ... 我应该为这个函数编写什么代码 eg. val data = sc.textFile("path") 谢谢 val lineRdd = sc.textFile("path") val yourRequiredRdd = lineRdd .zipWithIndex .map({ case (line, index) => (index / 3
A
B
C
QW
OO
P
...
现在我要合并每三行行为,如下所示:
ABC
QWOOP
...
我应该为这个函数编写什么代码
eg. val data = sc.textFile("path")
谢谢
val lineRdd = sc.textFile("path")
val yourRequiredRdd = lineRdd
.zipWithIndex
.map({ case (line, index) => (index / 3, (index, line)))
.aggregateByKey(List.empty[(Long, String)])(
{ case (aggrList, (index, line)) => (index, line) :: aggrList },
{ case (aggrList1, aggrList2) => aggrList1 ++ aggrList2 }
)
.map({ case (key, aggrList) =>
aggrList
.sortBy({ case (index, line) => index })
.map({ case (index, line) => line })
.mkString("")
})