Java Hadoop分布式缓存抛出FileNotFound错误

Java Hadoop分布式缓存抛出FileNotFound错误,java,hadoop,mapreduce,distributed-caching,Java,Hadoop,Mapreduce,Distributed Caching,我试图使用listOfWords文件只计算任何输入文件中的单词。将错误获取为FileNotFound,即使我已验证该文件在HDFS中的正确位置 内部驱动程序: Configuration conf = new Configuration(); DistributedCache.addCacheFile(new URI("/user/training/listOfWords"), conf); Job job = new Job(conf,"CountEachWord Jo

我试图使用listOfWords文件只计算任何输入文件中的单词。将错误获取为FileNotFound,即使我已验证该文件在HDFS中的正确位置

内部驱动程序:

    Configuration conf = new Configuration();
    DistributedCache.addCacheFile(new URI("/user/training/listOfWords"), conf);
    Job job = new Job(conf,"CountEachWord Job");
内部映射器:

private Path[] ref_file;
ArrayList<String> globalList = new ArrayList<String>();

public void setup(Context context) throws IOException{

    this.ref_file = DistributedCache.getLocalCacheFiles(context.getConfiguration());

    FileSystem fs = FileSystem.get(context.getConfiguration());

    FSDataInputStream in_file = fs.open(ref_file[0]);
    System.out.println("File opened");

    BufferedReader br  = new BufferedReader(new InputStreamReader(in_file));//each line of reference file
    System.out.println("BufferReader invoked");

    String eachLine = null;
    while((eachLine = br.readLine()) != null)
    {
        System.out.println("eachLine is: "+ eachLine);
        globalList.add(eachLine);

    }

}

我已经验证了所提到的文件存在于HDFS中。我还尝试使用localRunner。还是不行

您可以尝试此操作来检索文件

URI[]files=DistributedCache.getCacheFiles(context.getConfiguration())

您可以遍历文件。

像这样试试

司机

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path cachefile = new Path("path/to/file");
FileStatus[] list = fs.globStatus(cachefile);
for (FileStatus status : list) {
 DistributedCache.addCacheFile(status.getPath().toUri(), conf);
}
在映射器设置()中


在主要方法中,我使用这个

  Job job = Job.getInstance();
  job.setJarByClass(DistributedCacheExample.class);
  job.setJobName("Distributed cache example");
  job.addCacheFile(new Path("/user/cloudera/datasets/abc.dat").toUri());
然后在Mapper中,我使用了这个样板

  protected void setup(Context context) throws IOException, InterruptedException {
     URI[] files = context.getCacheFiles();
     for(URI file : files){
     if(file.getPath().contains("abc.dat")){
       Path path = new Path(file);
       BufferedReader reader = new BufferedReader(new FileReader(path.getName()));
       String line = reader.readLine();
       while(line != null){
         ......
       }
     }
  }
我正在处理这些依赖项

  <dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-common</artifactId>
  <version>2.7.3</version>
  </dependency>

  <dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-mapreduce-client-core</artifactId>
  <version>2.7.3</version>
  </dependency>

org.apache.hadoop
hadoop通用
2.7.3
org.apache.hadoop
hadoop mapreduce客户端核心
2.7.3

我的窍门是在
FileReader
中使用
path.getName
,如果没有,我会得到
FileNotFoundException

,而不是DistributedCache.addCacheFile(新URI(“/user/training/listOfWords”);试试这个DistributedCache.addCacheFile(新URI(“/user/training/listOfWords”)、job.getConfiguration();有些先生找不到文件
  Job job = Job.getInstance();
  job.setJarByClass(DistributedCacheExample.class);
  job.setJobName("Distributed cache example");
  job.addCacheFile(new Path("/user/cloudera/datasets/abc.dat").toUri());
  protected void setup(Context context) throws IOException, InterruptedException {
     URI[] files = context.getCacheFiles();
     for(URI file : files){
     if(file.getPath().contains("abc.dat")){
       Path path = new Path(file);
       BufferedReader reader = new BufferedReader(new FileReader(path.getName()));
       String line = reader.readLine();
       while(line != null){
         ......
       }
     }
  }
  <dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-common</artifactId>
  <version>2.7.3</version>
  </dependency>

  <dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-mapreduce-client-core</artifactId>
  <version>2.7.3</version>
  </dependency>