Apache spark spark中reduceByKey函数的意外输出_Apache Spark

Apache spark spark中reduceByKey函数的意外输出

apache-spark

Apache spark spark中reduceByKey函数的意外输出,apache-spark,Apache Spark,我正在编写代码，需要使用reduceBykey函数聚合密钥 //mapToPair代码 JavaPairRDD<String,Integer> taxiPair = taxiData.mapToPair( x->{ if(!x.isEmpty()) { String [] split = x.split(",");

我正在编写代码，需要使用reduceBykey函数聚合密钥

//mapToPair代码

JavaPairRDD<String,Integer> taxiPair = taxiData.mapToPair(

            x->{


                if(!x.isEmpty())
                {

                    String [] split = x.split(",");
                    x=split[9]; //Extracting Index Value 9

                }



           return new Tuple2<String,Integer>("Payment:"+x,1);
        }

    );

    List<Tuple2<String,Integer>> sample = taxiPair.take(10);

    for(Tuple2<String,Integer> t: sample)
    {


        System.out.println(t._1+","+t._2);


    }

根据以上我的理解，一旦reduceByKey完成，它应该给出以下结果：

Payment:1,9
Payment:2,1

但是,

//代码还原键

JavaPairRDD<String,Integer> taxiReduce = taxiPair.reduceByKey(

     (x,y)-> (y+y)



    );


    List<Tuple2<String,Integer>> sample2 = taxiReduce.collect();

    for(Tuple2<String,Integer> t: sample2)
    {


        System.out.println(t._1+","+t._2);


    }

语句中的拼写错误，此处需要“x+y”而不是“y+y”：

javapairdd-taxiReduce=taxiPair.reduceByKey(
（x，y）->（y+y））；

它应该是

（x，y）->（x+y）虽然这可以回答问题，但最好添加一些上下文来解释代码的功能。
JavaPairRDD<String,Integer> taxiReduce = taxiPair.reduceByKey(

     (x,y)-> (y+y)



    );


    List<Tuple2<String,Integer>> sample2 = taxiReduce.collect();

    for(Tuple2<String,Integer> t: sample2)
    {


        System.out.println(t._1+","+t._2);


    }

Payment:3,2
Payment:2,2
Payment:,2
Payment:4,2
Payment:1,2

  (x,y)-> (y+y)

 JavaPairRDD<String,Integer> taxiReduce = taxiPair.reduceByKey(

 (x,y)-> (y+y) );