Apache pig 异常长的清管器作业开始时间

Apache pig 异常长的清管器作业开始时间,apache-pig,Apache Pig,一个pig脚本(并不比我构建的任何其他脚本都复杂)在作业开始之前,它似乎循环了很长时间: 2013-10-08 10:46:07,655 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 10 2013-10-08 10:46:07,659 [main] INFO org.apache.pig.backend.hadoop.executionen

一个pig脚本(并不比我构建的任何其他脚本都复杂)在作业开始之前,它似乎循环了很长时间:

2013-10-08 10:46:07,655 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 10
2013-10-08 10:46:07,659 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 10
2013-10-08 10:46:09,168 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 10
2013-10-08 10:46:09,168 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 10
2013-10-08 10:46:11,381 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 10
2013-10-08 10:46:11,381 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 10
2013-10-08 10:46:13,875 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 10
2013-10-08 10:46:13,875 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 10
2013-10-08 10:46:16,303 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 10

它重复上述步骤大约4分钟,而这一步骤通常在几秒钟内完成。除了删除部分脚本之外,我还无法确定原因,但问题似乎不是由脚本的任何特定部分引起的。我有其他像这个一样复杂的脚本,我没有这个问题。问题可能是由什么引起的?

如果没有更多信息,我无法确定,但pig似乎正在等待集群的JobTracker开始运行脚本生成的底层Map/Reduce作业。出现这种情况的原因有很多,比如在资源耗尽的共享集群上运行。您很可能需要查看集群的JobTracker和/或TaskTracker,以了解确切原因