Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 我们如何在Spark Core中通过两个不同的字段实现排序?_Apache Spark - Fatal编程技术网

Apache spark 我们如何在Spark Core中通过两个不同的字段实现排序?

Apache spark 我们如何在Spark Core中通过两个不同的字段实现排序?,apache-spark,Apache Spark,我正在用spark做一些基本的编程 输入文件: 2008,20 2008,40 2000,10 2000,30 2001,9 scala> val dataRDD = sc.textFile("/user/cloudera/inputfiles/year.txt") scala> val mapRDD = dataRDD.map(elem => elem.split(",")) scala> val keyValueRDD = mapRDD.map( elem =>

我正在用spark做一些基本的编程

输入文件:

2008,20
2008,40
2000,10
2000,30
2001,9
scala> val dataRDD = sc.textFile("/user/cloudera/inputfiles/year.txt")
scala> val mapRDD = dataRDD.map(elem => elem.split(","))
scala> val keyValueRDD = mapRDD.map( elem => (elem(0),elem(1)))
scala> val sortRDD = keyValueRDD.sortByKey(true,1)
res29: Array[(String, String)] = Array((2000,30), (2000,10), (2001,9), (2008,20), (2008,40))
2000,30
2000,10
2001,9
2008,40
2008,20
我的火花代码:

2008,20
2008,40
2000,10
2000,30
2001,9
scala> val dataRDD = sc.textFile("/user/cloudera/inputfiles/year.txt")
scala> val mapRDD = dataRDD.map(elem => elem.split(","))
scala> val keyValueRDD = mapRDD.map( elem => (elem(0),elem(1)))
scala> val sortRDD = keyValueRDD.sortByKey(true,1)
res29: Array[(String, String)] = Array((2000,30), (2000,10), (2001,9), (2008,20), (2008,40))
2000,30
2000,10
2001,9
2008,40
2008,20
我希望输出按年份按升序排序,每年的值按降序排序

预期输出:

2008,20
2008,40
2000,10
2000,30
2001,9
scala> val dataRDD = sc.textFile("/user/cloudera/inputfiles/year.txt")
scala> val mapRDD = dataRDD.map(elem => elem.split(","))
scala> val keyValueRDD = mapRDD.map( elem => (elem(0),elem(1)))
scala> val sortRDD = keyValueRDD.sortByKey(true,1)
res29: Array[(String, String)] = Array((2000,30), (2000,10), (2001,9), (2008,20), (2008,40))
2000,30
2000,10
2001,9
2008,40
2008,20

有人能帮我得到这个结果吗?

你必须定义一个类,它包含年份和年份值。此类应通过重写compare方法扩展Ordered。然后使用此类的对象作为键值并应用sortBy操作

class TwoKeys(var first: Int, var second: Int) extends Ordered[TwoKeys] {
    def compare(that: TwoKeys): Int = {
      if(first == that.first){
        that.second - second 
      }else{
        first - that.first
      }
    }
  }
...
val keyValueRDD = mapRDD.map(elem => (TwoKeys(elem(0), elem(1)), TwoKeys(elem(0), elem(1))))
val sortRDD = keyValueRDD.sortByKey(true,1)

是的,它是有效的,但我想了解一些关于比较法的解释?那是什么,秒-秒?我们在减法吗?请解释您在compare methodcompare返回值-1、0或1中编写的逻辑,它们分别对应于less、equal和biger。