Java：读hadoop reducer'；输出文件_Java_Hadoop_Mapreduce

Java：读hadoop reducer'；输出文件

java hadoop mapreduce

Java：读hadoop reducer'；输出文件,java,hadoop,mapreduce,Java,Hadoop,Mapreduce,我试图阅读和分析hadoop中的mapreduce最终输出。下面是我的“作业”文件中的部分代码。我想使用文件系统（HadoopAPI）来读取输出文件，但是，我有一个问题，那就是在哪里放置以粗体突出显示的代码（在双星之间）。如果我把它放在system.exit下面，恐怕代码会被跳过 public static void main(String[] args) throws Exception { Configuration conf = new Configuration();

我试图阅读和分析hadoop中的mapreduce最终输出。下面是我的“作业”文件中的部分代码。我想使用文件系统（HadoopAPI）来读取输出文件，但是，我有一个问题，那就是在哪里放置以粗体突出显示的代码（在双星之间）。如果我把它放在system.exit下面，恐怕代码会被跳过

public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args)
                .getRemainingArgs();
        if (otherArgs.length != 3) {
            System.err.println("Usage: format is <in> <out> <keyword>");
            System.exit(2);
        }

        **Path distCache = new Path("/");
        String fileSys = conf.get("fs.default.name");
        HashMap<String, Integer> jobCountMap = new HashMap<String, Integer>();**

        conf.set("jobTest", otherArgs[2]);
        Job job = new Job(conf, "job count");
        job.setJarByClass(JobResults.class);
        job.setMapperClass(JobMapper.class);
        job.setCombinerClass(JobReducer.class);
        job.setReducerClass(JobReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

        distCache = new Path(args[2]);
  //      FileSystem fs = distCache.getFileSystem(conf); // for Amazon AWS
        if (fileSys.split(":")[0].trim().equalsIgnoreCase("s3n")) distCache = new Path("s3n:/" + distCache);

        FileSystem fs = FileSystem.get(conf);           // for local cluster

        Path pathPattern = new Path(distCache, "part-r-[0-9]*");
        FileStatus[] list = fs.globStatus(pathPattern);

        for (FileStatus status : list)
        {
//          DistributedCache.addCacheFile(status.getPath().toUri(), conf);
            try {
            BufferedReader brr = new BufferedReader(new FileReader(status.getPath().toString()));
                            String line;
                while ((line = brr.readLine()) != null)
                {
                    String[] resultsCount = line.split("\\|");
                    jobCountMap.put(resultsCount[0], Integer.parseInt(resultsCount[1].trim()));
                }
            } catch (FileNotFoundException e)
            {
                e.printStackTrace();
            } catch (IOException e)
            {
               e.printStackTrace();
            }
        }

        System.out.println("the size of Hashmap is: " + jobCountMap.size());
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }

publicstaticvoidmain（字符串[]args）引发异常{
Configuration conf=新配置（）；
String[]otherArgs=新的GenericOptionsParser（conf，args）
.getremainargs（）；
if（otherArgs.length！=3）{
System.err.println（“用法：格式为”）；
系统出口（2）；
}
**路径distCache=新路径（“/”）；
字符串fileSys=conf.get（“fs.default.name”）；
HashMap jobCountMap=新HashMap（）**
conf.set（“jobTest”，其他参数[2]）；
作业=新作业（配置，“作业计数”）；
job.setJarByClass（JobResults.class）；
job.setMapperClass（JobMapper.class）；
job.setCombinerClass（JobReducer.class）；
job.setReducerClass（JobReducer.class）；
job.setOutputKeyClass（Text.class）；
job.setOutputValueClass（IntWritable.class）；
addInputPath（作业，新路径（其他参数[0]）；
setOutputPath（作业，新路径（其他参数[1]）；
distCache=新路径（args[2]）；
//文件系统fs=distCache.getFileSystem（conf）；//用于AmazonAWS
if（fileSys.split（“：”[0].trim（）.equalsIgnoreCase（“s3n”））distCache=新路径（“s3n:/”+distCache）；
FileSystem fs=FileSystem.get（conf）；//用于本地集群
路径pathPattern=新路径（distCache，“part-r-[0-9]*”）；
FileStatus[]list=fs.globStatus（路径模式）；
用于（文件状态：列表）
{
//DistributedCache.addCacheFile（status.getPath（）.toUri（），conf）；
试一试{
BufferedReader brr=新的BufferedReader（新文件读取器（status.getPath（）.toString（））；
弦线；
而（（line=brr.readLine（））！=null）
{
字符串[]resultCount=line.split（“\\\\”）；
jobCountMap.put（resultcount[0]，Integer.parseInt（resultcount[1].trim（））；
}
}catch（filenotfounde异常）
{
e、 printStackTrace（）；
}捕获（IOE异常）
{
e、 printStackTrace（）；
}
}
System.out.println（“Hashmap的大小是：“+jobCountMap.size（）”）；
系统退出（作业等待完成（真）？0:1；
}

系统退出问题有一个相当简单的解决方案。你曾经：

 System.out.println("the size of Hashmap is: " + jobCountMap.size());
    System.exit(job.waitForCompletion(true) ? 0 : 1);

改为放置以下内容：

System.out.println("the size of Hashmap is: " + jobCountMap.size());
boolean completionStatus = job.waitForCompletion(true);

//your code here

if(completionStatus==true){
    System.exit(0)
}else{
    System.exit(1)
}

这应该允许您在主功能中运行任何需要的处理，包括启动第二个作业（如果需要）