Java 如何按值或计数对单词计数程序进行排序?

Java 如何按值或计数对单词计数程序进行排序?,java,hadoop,mapreduce,Java,Hadoop,Mapreduce,如何按count/value而不是键对wordcount输出进行排序 在正常情况下,输出为 hi 2 hw 3 wr 1 r 3 wr 1 hi 2 hw 3 r 3 但是期望的输出是 hi 2 hw 3 wr 1 r 3 wr 1 hi 2 hw 3 r 3 我的代码是: public class sortingprog { public static class Map extends MapReduceBase implements Mapper<LongW

如何按count/value而不是键对wordcount输出进行排序

在正常情况下,输出为

hi 2
hw 3 
wr 1 
r 3
wr 1
hi 2
hw 3
r 3
但是期望的输出是

hi 2
hw 3 
wr 1 
r 3
wr 1
hi 2
hw 3
r 3
我的代码是:

public class sortingprog {
     public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, IntWritable, Text> {
         private final static IntWritable one = new IntWritable(1);
         private Text word = new Text();

         public void map(LongWritable key, Text value, OutputCollector<IntWritable, Text> output, Reporter reporter) throws IOException {
           String line = value.toString();
           StringTokenizer tokenizer = new StringTokenizer(line);
           while (tokenizer.hasMoreTokens()) {
             word.set(tokenizer.nextToken());
             output.collect(one,word);
           }
         }
       }


     public static class Reduce extends MapReduceBase implements Reducer<IntWritable,Text, IntWritable, Text> {
     public void reduce(Iterator<IntWritable> key, Text value, OutputCollector<IntWritable, Text> output, Reporter reporter) throws IOException {
            int sum=0;
           while (key.hasNext()) {
             sum+=key.next().get();
           }
           output.collect(new IntWritable(sum),value);

     }

    @Override
    public void reduce(IntWritable arg0, Iterator<Text> arg1,
            OutputCollector<IntWritable, Text> arg2, Reporter arg3)
            throws IOException {
        // TODO Auto-generated method stub

    }
     }

     public static class GroupComparator extends WritableComparator {
            protected GroupComparator() {
                super(IntWritable.class, true);
            }

            @SuppressWarnings("rawtypes")
            @Override
            public int compare(WritableComparable w1, WritableComparable w2) {
                IntWritable v1 = (IntWritable) w1;
                IntWritable v2 = (IntWritable) w2;          
                return -1 * v1.compareTo(v2);
            }
        }

       public static void main(String[] args) throws Exception {
         JobConf conf = new JobConf(sortingprog.class);
         conf.setJobName("wordcount");


         conf.setOutputKeyClass(IntWritable.class);
         conf.setOutputValueClass(Text.class);


         conf.setMapperClass(Map.class);
         conf.setReducerClass(Reduce.class);

         conf.setOutputValueGroupingComparator(GroupComparator.class);

         conf.setInputFormat(TextInputFormat.class);
         conf.setOutputFormat(TextOutputFormat.class);

         FileInputFormat.setInputPaths(conf, new Path(args[0]));
         FileOutputFormat.setOutputPath(conf, new Path(args[1]));

         JobClient.runJob(conf);
       }
}

你所寻找的是所谓的二次排序。在这里,您可以找到关于如何在MapReduce中实现短值的两个教程:


您需要执行以下操作

创建使用这两个字段的自定义可写可比文件。 在compareTo方法中,提供比较自定义可写文件的实现逻辑。减速器稍后会调用该函数来对键进行排序。这是整个实施的关键。在比较器中,只需使用第二个字段来比较值。 publiccustompair实现了writableparable{ 公共定制字符串fld1,内部fld2{ this.fld1=fld1;//wr this.fld2=fld2;//1 } @凌驾 公共整数比较对象o2{ CustomPair other=CustomPair o2; int compareValue=other.fld2.compareTothis.fld2; 返回比较值; } 公共无效writeDataOutput抛出IOException{ dataOutput.writeUTFfld1; dataOutput.writeIntfld2; } //您必须实现其余的方法。 } 如果你需要额外的帮助,请告诉我