Java hadoop映射减少作业无输出_Java_Hadoop_Mapreduce

Java hadoop映射减少作业无输出

java hadoop mapreduce

Java hadoop映射减少作业无输出,java,hadoop,mapreduce,Java,Hadoop,Mapreduce,我正在用Netbeans编写MapReduce作业，并生成一个jar文件（也是在NB中）。当我尝试在hadoop（版本1.2.1）中执行此作业时，我执行以下命令： $ hadoop jar job.jar org.job.mainClass /home/user/in.txt /home/user/outdir 此命令不显示任何错误，但不创建outdir、outfiles等这是我的工作代码：制图员 public class Mapper extends MapReduceBase impl

我正在用Netbeans编写MapReduce作业，并生成一个jar文件（也是在NB中）。当我尝试在hadoop（版本1.2.1）中执行此作业时，我执行以下命令：

$ hadoop jar job.jar org.job.mainClass /home/user/in.txt /home/user/outdir

此命令不显示任何错误，但不创建outdir、outfiles等

这是我的工作代码：

制图员

public class Mapper extends MapReduceBase implements org.apache.hadoop.mapred.Mapper<LongWritable, Text, Text, IntWritable> {

            private final IntWritable one = new IntWritable(1);
            private Text company = new Text("");


            @Override
            public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
                company.set(value.toString());
                output.collect(value, one);

            }

        }

输入文件的格式如下：

name1
name2
name3
...

也就是说，我在虚拟机（Ubuntu12.04）中执行hadoop，没有root权限。Hadoop是否执行作业并将文件存储在不同的目录中

根据需要，您需要使用以下方法提交您的

JobConf

：

JobClient.runJob(configuration);

正确的hadoop命令是

$ hadoop jar job.jar /home/user/in.txt /home/user/outdir

hadoop jar myjar packagename.DriverClass input output

不是

Hadoop认为org.job.mainClass是输入文件，in.txt是输出文件。执行的结果是文件已经存在：in.txt。此代码适用于以下主要方法：

public static void main(String[] args) throws FileNotFoundException, IOException {

    JobConf configuration = new JobConf(CdrMR.class);
    configuration.setJobName("Dedupe companies");
    configuration.setOutputKeyClass(Text.class);
    configuration.setOutputValueClass(IntWritable.class);
    configuration.setMapperClass(NameMapper.class);
    configuration.setReducerClass(NameReducer.class);
    configuration.setInputFormat(TextInputFormat.class);
    configuration.setOutputFormat(TextOutputFormat.class);
    FileInputFormat.setInputPaths(configuration, new Path(args[0]));
    FileOutputFormat.setOutputPath(configuration, new Path(args[1]));
    System.out.println("Hello Hadoop");
    System.exit(JobClient.runJob(configuration).isSuccessful() ? 0 : 1);
}

谢谢@AlexeyShestakov和@Y.Prithvi

正确的hadoop命令是

$ hadoop jar job.jar /home/user/in.txt /home/user/outdir

hadoop jar myjar packagename.DriverClass input output

案例1

MapReduceProject
    |
    |__ src
         |
         |__ package1
            - Driver
            - Mapper
            - Reducer

那你就可以用

hadoop jar myjar input output

案例2

MapReduceProject
    |
    |__ src
         |
         |__ package1
         |  - Driver1
         |  - Mapper1
         |  - Reducer1
         |
         |__ package2
            - Driver2
            - Mapper2
            - Reducer2

对于这种情况，必须在hadoop命令中指定驱动程序类

hadoop jar myjar packagename.DriverClass input output

从哪个用户运行hadoop，在哪里存储输出？他们都是相同的用户吗？是的，都是相同的。用户和该用户的home dir。将其添加到main method

System.exit的最后一行（configuration.waitForCompletion（true）？0:1）对象JobConf没有waitForCompletion成员。我尝试以下代码：System.exit（JobClient.runJob（configuration）.isSuccessful（）？0:1）；但是结果是一样的。如果我执行echo$？结果是0。您的映射器被称为mapper
，而不是NameMapper
。减速器也一样。尝试将NameMapper.class
更改为Mapper.class。抱歉。为了澄清我的问题，我更改了课程名称。这段代码编译时没有错误。我试图设置不存在的输入文件和echo$的结果？是0。没有创建输出。正确的hadoop命令是“hadoop jar job.jar org.job.mainClass/home/user/in.txt/home/user/outdir”，但如果您的项目包含具有不同作业的多个包，则此命令适用。但是，如果您的项目中只有一个包，那么这很好
hadoop jar myjar packagename.DriverClass input output