Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop MapReduce未生成所需的输出_Hadoop_Mapreduce - Fatal编程技术网

Hadoop MapReduce未生成所需的输出

Hadoop MapReduce未生成所需的输出,hadoop,mapreduce,Hadoop,Mapreduce,我有一个包含专利信息的大文件。标题如下:“专利”、“GYEAR”、“GDATE”、“APPYEAR”、“国家”、“邮资”、“受让人”、“ASSCODE”、“索赔” 我想按年份计算每个专利的平均索赔额,其中关键是年份,价值是平均金额。然而,减速机输出显示我的平均量一直是1.0。我的程序哪里出错了 主课 public static void main(String [] args) throws Exception{ int res = ToolRunner.run(new Configu

我有一个包含专利信息的大文件。标题如下:“专利”、“GYEAR”、“GDATE”、“APPYEAR”、“国家”、“邮资”、“受让人”、“ASSCODE”、“索赔”

我想按年份计算每个专利的平均索赔额,其中关键是年份,价值是平均金额。然而,减速机输出显示我的平均量一直是1.0。我的程序哪里出错了

主课

 public static void main(String [] args) throws Exception{
    int res = ToolRunner.run(new Configuration(), new AvgClaimsByYear(), args);
    System.exit(res);
}
司机班

    Configuration config = this.getConf();  
    Job job = Job.getInstance(config, "average claims per year"); 
    job.setJarByClass(AvgClaimsByYear.class);
    job.setMapperClass(TheMapper.class);
    job.setPartitionerClass(ThePartitioner.class);
    job.setNumReduceTasks(4);
    job.setReducerClass(TheReducer.class);
    job.setOutputKeyClass(IntWritable.class);
    job.setOutputValueClass(IntWritable.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    return job.waitForCompletion(true) ? 0 : 1;
Mapper类

    public static class TheMapper extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
      private IntWritable yearAsKeyOut = new IntWritable();
      private IntWritable claimsAsValueOut = new IntWritable(1);
      @Override
      public void map(LongWritable keyIn, Text valueIn, Context context) throws IOException,InterruptedException {
        String line = valueIn.toString();
        if(line.contains("PATENT")) {
            return; //skip header
        }
        else {
            String [] patentData = line.split(","); 
            yearAsKeyOut.set(Integer.parseInt(patentData[1])); 
            if (patentData[8].length() > 0) {
                claimsAsValueOut.set(Integer.parseInt(patentData[8]));
            }
        }
        context.write(yearAsKeyOut, claimsAsValueOut);
    }   
}
输出

1963 1.0 
1964 1.0
1965 1.0 
1966 1.0 
1967 1.0 
1968 1.0 
1969 1.0 
1970 1.0

在calculateAvgClaimPerPatent()中,表达式在转换为浮点之前执行整数除法。将两个整数转换为除法前的浮点数

--编辑--


另外,再看一遍代码,平均写出量实际上是每个记录的平均索赔数量,按分区人员定义的4个间隔分组。换句话说,1972年一项专利的权利要求数量与1975年另一项专利的权利要求数量平均。这与您的问题描述不符。

根据代码,我认为它试图计算每年的平均索赔额,但不是每年每个专利的平均索赔额。为了简单起见,您可以取消自定义分区器。您可以创建一个专利+年份的复合密钥,并将索赔作为值。如果需要,可以创建一个单独的键类,但我觉得可以直接使用字符串连接来生成“复合”键。此外,将combiner类设置为reducer类将大大提高整体性能。但是从代码的外观来看,你是在计算每年的专利申请,而不是每年的专利申请。嗨,我相信这种方法的命名约定令人困惑。我不明白为什么减速机的平均产量是1.0。我必须使用分区器将年份分成4个文件夹。
 public static class TheReducer extends Reducer<IntWritable,IntWritable,IntWritable,FloatWritable> {
    @Override
    public void reduce(IntWritable yearKey, Iterable<IntWritable> values, Context context) throws IOException,InterruptedException {
        int totalClaimsThatYear = 0;
        int totalPatentCountThatYear = 0;
        FloatWritable avgClaim = new FloatWritable();

        for(IntWritable value : values) {

            totalClaimsThatYear += value.get();
            totalPatentCountThatYear += 1;      
        }
        avgClaim.set(calculateAvgClaimPerPatent (totalPatentCountThatYear, totalClaimsThatYear)); 
        context.write(yearKey, avgClaim);
    }

    public float calculateAvgClaimPerPatent (int totalPatentCount, int totalClaims) {
        return (float)totalClaims/totalPatentCount;
    }
}
  3070801,1963,1096,,"BE","",,1,,269,6,69,,1,,0,,,,,,,
  3070802,1963,1096,,"US","TX",,1,,2,6,63,,0,,,,,,,,,
  3070803,1963,1096,,"US","IL",,1,,2,6,63,,9,,0.3704,,,,,,,
  3070804,1963,1096,,"US","OH",,1,,2,6,63,,3,,0.6667,,,,,,,
  3070805,1963,1096,,"US","CA",,1,,2,6,63,,1,,0,,,,,,,
1963 1.0 
1964 1.0
1965 1.0 
1966 1.0 
1967 1.0 
1968 1.0 
1969 1.0 
1970 1.0