Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 为什么重新分区和SortWithinPartitions不排序?_Scala_Apache Spark - Fatal编程技术网

Scala 为什么重新分区和SortWithinPartitions不排序?

Scala 为什么重新分区和SortWithinPartitions不排序?,scala,apache-spark,Scala,Apache Spark,以下是我正在做的: val rddkv = sc.parallelize(List(("k1",1),("k2",2),("k1",2),("k3",5),("k3",1))) //rddkv.collect //Array[(String, Int)] = Array((k1,1), (k2,2), (k1,2), (k3,5), (k3,1)) rddkv.repartitionAndSortWithinPartitions(new org.apache.spark.Ran

以下是我正在做的:

val rddkv = sc.parallelize(List(("k1",1),("k2",2),("k1",2),("k3",5),("k3",1)))
    //rddkv.collect
    //Array[(String, Int)] = Array((k1,1), (k2,2), (k1,2), (k3,5), (k3,1))

rddkv.repartitionAndSortWithinPartitions(new org.apache.spark.RangePartitioner(3,rddkv)).mapPartitionsWithIndex( (i,iter_p) => iter_p.map(x=>" index="+i+" value="+x)).collect
    //Array[String] = Array(" index=0 value=(k1,1)", " index=0 value=(k1,2)", " index=1 value=(k2,2)", " index=1 value=(k3,5)", " index=1 value=(k3,1)")

请注意,分区中的值没有排序。为什么呢?我遗漏了什么?

RDD实际上是经过排序的,您可能误解了该方法的工作原理。该方法在键值对的RDD上运行,
(K,V)
,其中
K
是键,
V
是值。它将重新分区,然后按键对数据进行排序

查看您的输出顺序:
(k1,1)、(k1,2)、(k2,2)、(k3,5)、(k3,1)
,它在键后正确排序

如果只想对值进行排序,而忽略它们所在的分区,那么只需执行
rdd.sortBy(u.\u 2)