Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/393.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
火花Java累加器不递增_Java_Apache Spark_Bigdata - Fatal编程技术网

火花Java累加器不递增

火花Java累加器不递增,java,apache-spark,bigdata,Java,Apache Spark,Bigdata,刚从Spark Java中的小步骤开始。下面是一个字数计算程序,它包括一个停止字表,可以跳过列表中的单词。我有两个累加器来计算跳过的单词和未跳过的单词 但是,程序结束时的Sysout始终将两个累加器值都设置为0 请指出我错在哪里 public static void main(String[] args) throws FileNotFoundException { SparkConf conf = new SparkConf(); conf.setAppNam

刚从Spark Java中的小步骤开始。下面是一个字数计算程序,它包括一个停止字表,可以跳过列表中的单词。我有两个累加器来计算跳过的单词和未跳过的单词

但是,程序结束时的
Sysout
始终将两个累加器值都设置为0

请指出我错在哪里

public static void main(String[] args) throws FileNotFoundException {

        SparkConf conf = new SparkConf();
        conf.setAppName("Third App - Word Count WITH BroadCast and Accumulator");
        JavaSparkContext jsc = new JavaSparkContext(conf);
        JavaRDD<String> fileRDD = jsc.textFile("hello.txt");
        JavaRDD<String> words = fileRDD.flatMap(new FlatMapFunction<String, String>() {

            public Iterable<String> call(String aLine) throws Exception {
                return Arrays.asList(aLine.split(" "));
            }
        });

        String[] stopWordArray = getStopWordArray();

         final Accumulator<Integer> skipAccumulator = jsc.accumulator(0);
         final Accumulator<Integer> unSkipAccumulator = jsc.accumulator(0);

        final Broadcast<String[]> stopWordBroadCast = jsc.broadcast(stopWordArray);

        JavaRDD<String> filteredWords = words.filter(new Function<String, Boolean>() {

            public Boolean call(String inString) throws Exception {
                boolean filterCondition = !Arrays.asList(stopWordBroadCast.getValue()).contains(inString);
                if(!filterCondition){
                    System.out.println("Filtered a stop word ");
                    skipAccumulator.add(1);
                }else{
                    unSkipAccumulator.add(1);
                }
                return filterCondition;

            }
        });

        System.out.println("$$$$$$$$$$$$$$$Filtered Count "+skipAccumulator.value());
        System.out.println("$$$$$$$$$$$$$$$ UN Filtered Count "+unSkipAccumulator.value());

        /* rest of code - works fine */
        jsc.stop();
        jsc.close();
        }
-----------编辑------------------

注释部分中的其余代码

JavaPairRDD<String, Integer> wordOccurrence = filteredWords.mapToPair(new PairFunction<String, String, Integer>() {

            public Tuple2<String, Integer> call(String inWord) throws Exception {
                return new Tuple2<String, Integer>(inWord, 1);
            }
        });

        JavaPairRDD<String, Integer> summed = wordOccurrence.reduceByKey(new Function2<Integer, Integer, Integer>() {

            public Integer call(Integer a, Integer b) throws Exception {
                return a+b;
            }
        });

        summed.saveAsTextFile("hello-out");
javapairdd-wordOccurrence=filteredWords.mapToPair(新PairFunction(){
公共Tuple2调用(字符串inWord)引发异常{
返回新的Tuple2(inWord,1);
}
});
javapairdd summared=wordOccurrence.reduceByKey(新函数2(){
公共整数调用(整数a、整数b)引发异常{
返回a+b;
}
});
saveAsTextFile(“hello out”);

您错过了重要部分的发布
/*代码的其余部分-工作正常*/
。我可以保证,您在代码的其余部分调用了一些操作。触发DAG使用累加器执行代码。尝试在println之前添加一个
filteredWords.collect
,您应该会看到输出。请记住,Spark在转换时是惰性的,只在操作时执行。

两个累加器都是0,由于出现了5次停止字,因此过滤后的文本将打印5次。编辑问题:)回答正确-Spark在转换上很懒惰,只在我强制执行第一个()以使其工作的操作上执行。
JavaPairRDD<String, Integer> wordOccurrence = filteredWords.mapToPair(new PairFunction<String, String, Integer>() {

            public Tuple2<String, Integer> call(String inWord) throws Exception {
                return new Tuple2<String, Integer>(inWord, 1);
            }
        });

        JavaPairRDD<String, Integer> summed = wordOccurrence.reduceByKey(new Function2<Integer, Integer, Integer>() {

            public Integer call(Integer a, Integer b) throws Exception {
                return a+b;
            }
        });

        summed.saveAsTextFile("hello-out");