Sorting 带元组的Spark重分区和SortWithinPartition
我尝试按照以下示例对hbase行进行分区: 但是,我已经在(String,String,String)中存储了数据,其中第一个是行键,第二个是列名,第三个是列值 我尝试编写隐式排序来实现OrderedRDD隐式Sorting 带元组的Spark重分区和SortWithinPartition,sorting,apache-spark,hbase,rdd,Sorting,Apache Spark,Hbase,Rdd,我尝试按照以下示例对hbase行进行分区: 但是,我已经在(String,String,String)中存储了数据,其中第一个是行键,第二个是列名,第三个是列值 我尝试编写隐式排序来实现OrderedRDD隐式 implicit val caseInsensitiveOrdering: Ordering[(String, String, String)] = new Ordering[(String, String, String)] { override def compare(x:
implicit val caseInsensitiveOrdering: Ordering[(String, String, String)] = new Ordering[(String, String, String)] {
override def compare(x: (String, String, String), y: (String, String, String)): Int = ???
}
但是重新分区和SortWithinPartitions仍然不可用。有什么方法可以将此方法用于此元组吗?RDD必须具有键和值,而不仅仅是值,例如:
val data = List((("5", "6", "1"), (1)))
val rdd : RDD[((String, String, String), Int)] = sparkContext.parallelize(data)
implicit val caseInsensitiveOrdering = new Ordering[(String, String, String)] {
override def compare(x: (String, String, String), y: (String, String, String)): Int = 1
}
rdd.repartitionAndSortWithinPartitions(..)
RDD必须有键和值,而不仅仅是值,例如:
val data = List((("5", "6", "1"), (1)))
val rdd : RDD[((String, String, String), Int)] = sparkContext.parallelize(data)
implicit val caseInsensitiveOrdering = new Ordering[(String, String, String)] {
override def compare(x: (String, String, String), y: (String, String, String)): Int = 1
}
rdd.repartitionAndSortWithinPartitions(..)