Scala 如何在RDD中围绕文本的引号?
我有一个RDD,里面有几行文字。文本来自文本文件,并且具有新的换行符(返回)。我希望在RDD的第一个单词和最后一个单词上加引号Scala 如何在RDD中围绕文本的引号?,scala,apache-spark,Scala,Apache Spark,我有一个RDD,里面有几行文字。文本来自文本文件,并且具有新的换行符(返回)。我希望在RDD的第一个单词和最后一个单词上加引号 val fileRdd = sc.textFile("file://data/sample.txt") val newRdd = fileRdd 文本文件中的示例输入。请注意,文本文件中有新行或返回: I once did an interview for the Banbury Herald. I must look it out one of these day
val fileRdd = sc.textFile("file://data/sample.txt")
val newRdd = fileRdd
文本文件中的示例输入。请注意,文本文件中有新行或返回:
I once did an interview for the Banbury Herald. I must look it out one of these days, for the biography.
Strange chap they sent me. A boy, really. As tall as a man, but with the puppy fat of youth.
It was nightfall now and I must go home.
RDD中的预期输出:
“I once did an interview for the Banbury Herald. I must look it out one of these days, for the biography.
Strange chap they sent me. A boy, really. As tall as a man, but with the puppy fat of youth.
It was nightfall now and I must go home.”
我想要的是在第一个和最后一个单词上加引号,并将其存储在新的RDD数据类型中。你能帮我解决这个问题吗?如果上游没有洗牌,你可以,但这没有任何意义。如果你发现自己在思考顺序、开始、结束和类似的概念,你就处于一种顺序思维模式中,这根本不适合Spark 也就是说:
val fileRdd = sc.parallelize(Seq(
"I once did an interview for the Banbury Herald. I must look it out one of these days, for the biography.",
"Strange chap they sent me. A boy, really. As tall as a man, but with the puppy fat of youth.",
"It was nightfall now and I must go home."
))
查找计数:
val n = fileRdd.count
zipWithIndex
和map
:
val withQuotes = fileRdd.zipWithIndex.map {
case (line, 0) => "\"" + line
case (line, m) if m == n - 1 => line + "\""
case (line, _) => line
}
没有帮你吗?是的。非常感谢。