(Hadoop)MapReduce-链作业-作业控制不';停不下来
我需要链接两个MapReduce作业。我使用JobControl将job2设置为job1的依赖项。 它工作,输出文件被创建!!但它不会停止! 在外壳中,它保持此状态:(Hadoop)MapReduce-链作业-作业控制不';停不下来,hadoop,mapreduce,chain,job-control,Hadoop,Mapreduce,Chain,Job Control,我需要链接两个MapReduce作业。我使用JobControl将job2设置为job1的依赖项。 它工作,输出文件被创建!!但它不会停止! 在外壳中,它保持此状态: 12/09/11 19:06:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/09/11 19:06:25 INFO i
12/09/11 19:06:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/09/11 19:06:25 INFO input.FileInputFormat: Total input paths to process : 1
12/09/11 19:06:25 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/09/11 19:06:25 WARN snappy.LoadSnappy: Snappy native library not loaded
12/09/11 19:07:00 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/09/11 19:07:00 INFO input.FileInputFormat: Total input paths to process : 1
我怎样才能阻止它?
这是我的主要任务
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Configuration conf2 = new Configuration();
Job job1 = new Job(conf, "canzoni");
job1.setJarByClass(CanzoniOrdinate.class);
job1.setMapperClass(CanzoniMapper.class);
job1.setReducerClass(CanzoniReducer.class);
job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(IntWritable.class);
ControlledJob cJob1 = new ControlledJob(conf);
cJob1.setJob(job1);
FileInputFormat.addInputPath(job1, new Path(args[0]));
FileOutputFormat.setOutputPath(job1, new Path("/user/hduser/tmp"));
Job job2 = new Job(conf2, "songsort");
job2.setJarByClass(CanzoniOrdinate.class);
job2.setMapperClass(CanzoniSorterMapper.class);
job2.setSortComparatorClass(ReverseOrder.class);
job2.setInputFormatClass(KeyValueTextInputFormat.class);
job2.setReducerClass(CanzoniSorterReducer.class);
job2.setMapOutputKeyClass(IntWritable.class);
job2.setMapOutputValueClass(Text.class);
job2.setOutputKeyClass(Text.class);
job2.setOutputValueClass(IntWritable.class);
ControlledJob cJob2 = new ControlledJob(conf2);
cJob2.setJob(job2);
FileInputFormat.addInputPath(job2, new Path("/user/hduser/tmp/part*"));
FileOutputFormat.setOutputPath(job2, new Path(args[1]));
JobControl jobctrl = new JobControl("jobctrl");
jobctrl.addJob(cJob1);
jobctrl.addJob(cJob2);
cJob2.addDependingJob(cJob1);
jobctrl.run();
////////////////
// NEW CODE ///
//////////////
// delete jobctrl.run();
Thread t = new Thread(jobctrl);
t.start();
String oldStatusJ1 = null;
String oldStatusJ2 = null;
while (!jobctrl.allFinished()) {
String status =cJob1.toString();
String status2 =cJob2.toString();
if (!status.equals(oldStatusJ1)) {
System.out.println(status);
oldStatusJ1 = status;
}
if (!status2.equals(oldStatusJ2)) {
System.out.println(status2);
oldStatusJ2 = status2;
}
}
System.exit(0);
}
}我基本上做了彼得罗在上面提到的事情
public class JobRunner implements Runnable {
private JobControl control;
public JobRunner(JobControl _control) {
this.control = _control;
}
public void run() {
this.control.run();
}
}
在我的map/reduce课程中,我有:
public void handleRun(JobControl control) throws InterruptedException {
JobRunner runner = new JobRunner(control);
Thread t = new Thread(runner);
t.start();
while (!control.allFinished()) {
System.out.println("Still running...");
Thread.sleep(5000);
}
}
其中我只传递了jobControl对象。jobControl对象本身是可运行的,因此您可以这样使用它:
new Thread(myJobControlInstance).start()
只是对sinemetu1共享的代码片段的一个调整 当JobControl自身实现Runnable时,可以放弃对JobRunner的调用
Thread thread = new Thread(jobControl);
thread.start();
while (!jobControl.allFinished()) {
System.out.println("Still running...");
Thread.sleep(5000);
}
我还偶然发现了这个链接,其中用户确认JobControl只能使用新线程运行。
试试这个:
Thread jcThread = new Thread(jobControl);
jcThread.start();
System.out.println("循环判断jobControl运行状态 >>>>>>>>>>>>>>>>");
while (true) {
if (jobControl.allFinished()) {
System.out.println("====>> jobControl.allFinished=" + jobControl.getSuccessfulJobList());
jobControl.stop();
// 如果不加 break 或者 return,程序会一直循环
break;
}
if (jobControl.getFailedJobList().size() > 0) {
succ = 0;
System.out.println("====>> jobControl.getFailedJobList=" + jobControl.getFailedJobList());
jobControl.stop();
// 如果不加 break 或者 return,程序会一直循环
break;
}
}
我用一个线程来启动JobControl解决了这个问题。我使用while循环检查作业是否已完成:while(!jobctrl.allFinished())和循环外的System.exit()。现在我想让作业返回信息消息,我所得到的只是知道哪个作业正在运行,以及ControlledJob.toString()。我不知道如何获取信息消息,如:映射器任务的数量、reduce任务的数量、输入或输出中的记录等。。。有没有收到这些消息的想法?job.getCounters().toString()够了吗?这是JobControl类中的一个错误吗?嗨,我两年前写过代码,今晚或明天我会尝试研究代码。嗨,@Austin a,我用你要求的代码编辑我的消息。