Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/apache/9.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 如何在不使用OOzie的情况下创建Hadoop作业链_Java_Apache_Hadoop_Mapreduce - Fatal编程技术网

Java 如何在不使用OOzie的情况下创建Hadoop作业链

Java 如何在不使用OOzie的情况下创建Hadoop作业链,java,apache,hadoop,mapreduce,Java,Apache,Hadoop,Mapreduce,我想创建一个包含三个Hadoop作业的链,其中一个作业的输出作为第二个作业的输入,依此类推。我想在不使用Oozie的情况下执行此操作 我已经编写了以下代码来实现它:- public class TfIdf { public static void main(String args[]) throws IOException, InterruptedException, ClassNotFoundException { TfIdf tfIdf = new TfIdf

我想创建一个包含三个Hadoop作业的链,其中一个作业的输出作为第二个作业的输入,依此类推。我想在不使用Oozie的情况下执行此操作

我已经编写了以下代码来实现它:-

public class TfIdf {
    public static void main(String args[]) throws IOException, InterruptedException, ClassNotFoundException
    {
        TfIdf tfIdf = new TfIdf();
        tfIdf.runWordCount();
        tfIdf.runDocWordCount();
        tfIdf.TFIDFComputation();
    }

    public void runWordCount() throws IOException, InterruptedException, ClassNotFoundException
    {
        Job job = new Job();


        job.setJarByClass(TfIdf.class);
        job.setJobName("Word Count calculation");

        job.setMapperClass(WordFrequencyMapper.class);
        job.setReducerClass(WordFrequencyReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        FileInputFormat.setInputPaths(job, new Path("input"));
        FileOutputFormat.setOutputPath(job, new Path("ouput"));

        job.waitForCompletion(true);
    }

    public void runDocWordCount() throws IOException, InterruptedException, ClassNotFoundException
    {
        Job job = new Job();

        job.setJarByClass(TfIdf.class);
        job.setJobName("Word Doc count calculation");

        job.setMapperClass(WordCountDocMapper.class);
        job.setReducerClass(WordCountDocReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        FileInputFormat.setInputPaths(job, new Path("output"));
        FileOutputFormat.setOutputPath(job, new Path("ouput_job2"));

        job.waitForCompletion(true);
    }

    public void TFIDFComputation() throws IOException, InterruptedException, ClassNotFoundException
    {
        Job job = new Job();

        job.setJarByClass(TfIdf.class);
        job.setJobName("TFIDF calculation");

        job.setMapperClass(TFIDFMapper.class);
        job.setReducerClass(TFIDFReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        FileInputFormat.setInputPaths(job, new Path("output_job2"));
        FileOutputFormat.setOutputPath(job, new Path("ouput_job3"));

        job.waitForCompletion(true);
    }
}
但是我得到了一个错误:

Input path does not exist: hdfs://localhost.localdomain:8020/user/cloudera/output

谁能帮我一下吗?

这个答案来得有点晚,但是。。。这只是你的名字中的一个简单的输入错误。您已经将第一个作业的输出写入dir“output”,第二个作业正在“output”中查找它。

hadoop fs-ls/user/cloudera/show是什么[cloudera@localhost ~]$hadoop fs-ls/user/cloudera找到4个项目drwx------cloudera cloudera 0 2013-10-31 01:37/user/cloudera/.Trash drwx------cloudera cloudera 0 2013-11-13 11:02/user/cloudera/.staging drwxr-xr-x-cloudera cloudera 0 2013-11-07 19:20/user/cloudera/input drwxr-xr-x-cloudera cloudera 02013-11-13 11:02/user/cloudera/ouput使用hadoop fs-ls怎么样hdfs://localhost.localdomain:8020/user/cloudera/ 相反