Java 在缓存中添加文件时FileNotFoundException-Hadoop-Mapreduce
注意:我已经在这里浏览过有类似问题的帖子,并尝试过那里建议的不同方法,但仍然无法解决问题 我想将HDFS中的一个文件添加到映射器的缓存中,因此我在驱动程序中添加了它,如下所示:Java 在缓存中添加文件时FileNotFoundException-Hadoop-Mapreduce,java,file,memory,hadoop,mapreduce,Java,File,Memory,Hadoop,Mapreduce,注意:我已经在这里浏览过有类似问题的帖子,并尝试过那里建议的不同方法,但仍然无法解决问题 我想将HDFS中的一个文件添加到映射器的缓存中,因此我在驱动程序中添加了它,如下所示: //Driver program public static void main(String[] args) throws Exception { Job job = Job.getInstance(new Configuration(), "QuestionOne"); Configuration conf =
//Driver program
public static void main(String[] args) throws Exception {
Job job = Job.getInstance(new Configuration(), "QuestionOne");
Configuration conf = job.getConfiguration();
// I am passing my file path(which is in HDFS) as an argument. Eg : /input/users.dat
job.addCacheFile(new URI(args[1]));
job.setJarByClass(QuestionOne.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
...
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
后跟map类以检索文件并将其用作:
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable>{
protected void setup(Context context) throws IOException, InterruptedException {
...
URI[] files = context.getCacheFiles();
for(URI p : files) {
System.out.println(p.getPath().toString()); // prints "/input/users.dat"
// Exception (FileNotFoundException) at this line
BufferedReader br = new BufferedReader(new FileReader(new File(p.getPath().toString())));
// Use br
br.close();
}
}
protected void map(LongWritable key, Text value, Context context ) throws IOException, InterruptedException {
...
...
}
protected void cleanup(Context context) throws IOException, InterruptedException {
...
...
}
}
公共静态类映射扩展映射器{
受保护的无效设置(上下文上下文)引发IOException、InterruptedException{
...
URI[]files=context.getCacheFiles();
for(urip:files){
System.out.println(p.getPath().toString());//prints”/input/users.dat
//此行出现异常(FileNotFoundException)
BufferedReader br=new BufferedReader(新文件阅读器(新文件(p.getPath().toString()));
//使用br
br.close();
}
}
受保护的void映射(LongWritable键、文本值、上下文上下文)引发IOException、InterruptedException{
...
...
}
受保护的无效清理(上下文上下文)引发IOException、InterruptedException{
...
...
}
}
但当我运行程序时,我会得到FileNotFoundException,如下所示:
14/10/25 03:00:29 WARN mapred.LocalJobRunner: job_local30078493_0001
java.lang.Exception: java.io.FileNotFoundException: /input/users.dat (No such file or directory)
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.FileNotFoundException: /hw1_input/users.dat (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at java.io.FileReader.<init>(FileReader.java:72)
at QuestionOne$Map.setup(QuestionOne.java:46)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/10/25 03:00:30 INFO mapreduce.Job: Job job_local30078493_0001 running in uber mode : false
14/10/25 03:00:29警告映射。本地JobRunner:job\u local30078493\u 0001
java.lang.Exception:java.io.FileNotFoundException:/input/users.dat(没有这样的文件或目录)
位于org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
在org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)上
原因:java.io.FileNotFoundException:/hw1_input/users.dat(无此类文件或目录)
在java.io.FileInputStream.open(本机方法)
位于java.io.FileInputStream。(FileInputStream.java:146)
位于java.io.FileReader。(FileReader.java:72)
在QuestionOne$Map.setup(QuestionOne.java:46)
位于org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
位于org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
位于org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
位于org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
位于java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
在java.util.concurrent.FutureTask.run(FutureTask.java:262)处
位于java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
位于java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
运行(Thread.java:744)
14/10/25 03:00:30信息mapreduce.作业:作业本地30078493\u 0001在uber模式下运行:false
请帮我解决这个问题 您需要使用分布式文件系统,而不是本地文件系统:
FileSystem fs = FileSystem.get(context.getConfiguration());
for (URI p : files) {
Path path = new Path(p.toString());
FSDataInputStream fsin = fs.open(path);
DataInputStream in = new DataInputStream(fsin);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
//Use br
br.close();
in.close();
fsin.close();
}