Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 如何在Scala中使用mapPartitions?_Java_Scala_Hadoop_Apache Spark - Fatal编程技术网

Java 如何在Scala中使用mapPartitions?

Java 如何在Scala中使用mapPartitions?,java,scala,hadoop,apache-spark,Java,Scala,Hadoop,Apache Spark,我试图在Scala中使用mapPartitions,但出现以下错误 [error] found : Unit [error] required: Iterator[?] [error] Error occurred in an application involving default arguments. [error] rdd.mapPartitions(showParts) 我调用mapPartitions函数如下 rdd.mapPartitions(showPa

我试图在Scala中使用
mapPartitions
,但出现以下错误

[error]  found   : Unit
[error]  required: Iterator[?]
[error] Error occurred in an application involving default arguments.
[error]         rdd.mapPartitions(showParts)
我调用
mapPartitions
函数如下

rdd.mapPartitions(showParts)
def showParts(iter: Iterator[(Long, Array[String])]) = 
{ 
  while (iter.hasNext)
  {
    val cur = iter.next;
    // Do something with cur
  }
}
其中
showParts
功能定义如下

rdd.mapPartitions(showParts)
def showParts(iter: Iterator[(Long, Array[String])]) = 
{ 
  while (iter.hasNext)
  {
    val cur = iter.next;
    // Do something with cur
  }
}

这里使用
mapPartitions
的正确方法是什么?

问题是传递给
mapPartitions
的UDF必须具有
迭代器[U]
的返回类型。您当前的代码不返回任何内容,因此属于
Unit
类型

如果要在执行
mapPartitions
后获取空的
RDD
,则可以执行以下操作:

def showParts(iter:Iterator[(Long,Array[String]))=
{ 
while(iter.hasNext)
{
val cur=iter.next;
//用cur做点什么
}
//返回迭代器[U]
迭代器.empty
}

您需要从
showParts
函数返回一个
迭代器

def onlyEven(numbers: Iterator[Int]) : Iterator[Int] = 
  numbers.filter(_ % 2 == 0)

def partitionSize(numbers: Iterator[Int]) : Iterator[Int] = 
  Iterator.single(numbers.length)

val rdd = sc.parallelize(0 to 10)
rdd.mapPartitions(onlyEven).collect()
// Array[Int] = Array(0, 2, 4, 6, 8, 10)

rdd.mapPartitions(size).collect()
// Array[Int] = Array(2, 3, 3, 3)

是否可以返回数组?是否也可以返回数组?通过调用数组上的
toIterator
,您始终可以将
数组
转换为
迭代器
,例如
数组(1,2,3,4)。toIterator
为您提供一个
迭代器[Int]