Apache spark spark中reduceByKey函数的意外输出
我正在编写代码,需要使用reduceBykey函数聚合密钥 //mapToPair代码Apache spark spark中reduceByKey函数的意外输出,apache-spark,Apache Spark,我正在编写代码,需要使用reduceBykey函数聚合密钥 //mapToPair代码 JavaPairRDD<String,Integer> taxiPair = taxiData.mapToPair( x->{ if(!x.isEmpty()) { String [] split = x.split(",");
JavaPairRDD<String,Integer> taxiPair = taxiData.mapToPair(
x->{
if(!x.isEmpty())
{
String [] split = x.split(",");
x=split[9]; //Extracting Index Value 9
}
return new Tuple2<String,Integer>("Payment:"+x,1);
}
);
List<Tuple2<String,Integer>> sample = taxiPair.take(10);
for(Tuple2<String,Integer> t: sample)
{
System.out.println(t._1+","+t._2);
}
根据以上我的理解,一旦reduceByKey完成,它应该给出以下结果:
Payment:1,9
Payment:2,1
但是,
//代码还原键
JavaPairRDD<String,Integer> taxiReduce = taxiPair.reduceByKey(
(x,y)-> (y+y)
);
List<Tuple2<String,Integer>> sample2 = taxiReduce.collect();
for(Tuple2<String,Integer> t: sample2)
{
System.out.println(t._1+","+t._2);
}
语句中的拼写错误,此处需要“x+y”而不是“y+y”:
javapairdd-taxiReduce=taxiPair.reduceByKey(
(x,y)->(y+y));
它应该是
(x,y)->(x+y)代码>虽然这可以回答问题,但最好添加一些上下文来解释代码的功能。
JavaPairRDD<String,Integer> taxiReduce = taxiPair.reduceByKey(
(x,y)-> (y+y)
);
List<Tuple2<String,Integer>> sample2 = taxiReduce.collect();
for(Tuple2<String,Integer> t: sample2)
{
System.out.println(t._1+","+t._2);
}
Payment:3,2
Payment:2,2
Payment:,2
Payment:4,2
Payment:1,2
(x,y)-> (y+y)
JavaPairRDD<String,Integer> taxiReduce = taxiPair.reduceByKey(
(x,y)-> (y+y) );