Scala 映射分区迭代器返回
任何人都可以帮助接受将迭代器listWords()方法返回mapPartitionsScala 映射分区迭代器返回,scala,apache-spark,Scala,Apache Spark,任何人都可以帮助接受将迭代器listWords()方法返回mapPartitions object MapPartitionExample { def main(args: Array[String]): Unit = { val conf= new SparkConf().setAppName("MapPartitionExample").setMaster("local[*]") val sc= new SparkContext(conf) val inpu
object MapPartitionExample {
def main(args: Array[String]): Unit = {
val conf= new SparkConf().setAppName("MapPartitionExample").setMaster("local[*]")
val sc= new SparkContext(conf)
val input:RDD[String] = sc.parallelize(List("ABC","DEF","GHU","YHG"))
val x= input.mapPartitions(word => listWords(word))
}
def listWords(words: Iterator[String]) : util.Iterator[String] = {
val arrList = new util.ArrayList[String]()
while( words.hasNext ) {
arrList.add( words.next())
}
return arrList.iterator()
}
}
应为
Iterable[NotInferU]
,但返回的是java.util.Iterator[String]
您需要通过导入scala.collection.JavaConversions将java.util.迭代器
转换为scala迭代器
def listWords(words: Iterator[String]) : Iterator[String] = {
val arrList = new util.ArrayList[String]()
while( words.hasNext ) {
arrList.add( words.next())
}
import scala.collection.JavaConversions._
return arrList.toList.iterator
}
其余代码保持原样
我希望答案是有帮助的在mapPartitions
中使用的函数的返回类型应该是scala.collection.Iterator
,而不是java.util.Iterator
。我看不出您当前的代码有什么意义,但您可以使用Scala可变集合:
import scala.collection.mutable.ArrayBuffer
def listWords(words: Iterator[String]) : Iterator[String] = {
val arr = ArrayBuffer[String]()
while( words.hasNext ) {
arr += words.next()
}
arr.toIterator
}
就我个人而言,我只是map
:
def listWords(words: Iterator[String]) : Iterator[String] = {
// Some init code
words.map(someFunction)
}
请接受答案,然后选择您更喜欢的解决方案。:)