Java Hadoop分布式缓存抛出FileNotFound错误_Java_Hadoop_Mapreduce_Distributed Caching

Java Hadoop分布式缓存抛出FileNotFound错误

java hadoop mapreduce

Java Hadoop分布式缓存抛出FileNotFound错误,java,hadoop,mapreduce,distributed-caching,Java,Hadoop,Mapreduce,Distributed Caching,我试图使用listOfWords文件只计算任何输入文件中的单词。将错误获取为FileNotFound，即使我已验证该文件在HDFS中的正确位置内部驱动程序： Configuration conf = new Configuration(); DistributedCache.addCacheFile(new URI("/user/training/listOfWords"), conf); Job job = new Job(conf,"CountEachWord Jo

我试图使用listOfWords文件只计算任何输入文件中的单词。将错误获取为FileNotFound，即使我已验证该文件在HDFS中的正确位置

内部驱动程序：

    Configuration conf = new Configuration();
    DistributedCache.addCacheFile(new URI("/user/training/listOfWords"), conf);
    Job job = new Job(conf,"CountEachWord Job");

内部映射器：

private Path[] ref_file;
ArrayList<String> globalList = new ArrayList<String>();

public void setup(Context context) throws IOException{

    this.ref_file = DistributedCache.getLocalCacheFiles(context.getConfiguration());

    FileSystem fs = FileSystem.get(context.getConfiguration());

    FSDataInputStream in_file = fs.open(ref_file[0]);
    System.out.println("File opened");

    BufferedReader br  = new BufferedReader(new InputStreamReader(in_file));//each line of reference file
    System.out.println("BufferReader invoked");

    String eachLine = null;
    while((eachLine = br.readLine()) != null)
    {
        System.out.println("eachLine is: "+ eachLine);
        globalList.add(eachLine);

    }

}

我已经验证了所提到的文件存在于HDFS中。我还尝试使用localRunner。还是不行

您可以尝试此操作来检索文件

URI[]files=DistributedCache.getCacheFiles（context.getConfiguration（））

您可以遍历文件。

像这样试试

司机

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path cachefile = new Path("path/to/file");
FileStatus[] list = fs.globStatus(cachefile);
for (FileStatus status : list) {
 DistributedCache.addCacheFile(status.getPath().toUri(), conf);
}

在映射器设置（）中

在主要方法中，我使用这个

  Job job = Job.getInstance();
  job.setJarByClass(DistributedCacheExample.class);
  job.setJobName("Distributed cache example");
  job.addCacheFile(new Path("/user/cloudera/datasets/abc.dat").toUri());

然后在Mapper中，我使用了这个样板

  protected void setup(Context context) throws IOException, InterruptedException {
     URI[] files = context.getCacheFiles();
     for(URI file : files){
     if(file.getPath().contains("abc.dat")){
       Path path = new Path(file);
       BufferedReader reader = new BufferedReader(new FileReader(path.getName()));
       String line = reader.readLine();
       while(line != null){
         ......
       }
     }
  }

我正在处理这些依赖项

  <dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-common</artifactId>
  <version>2.7.3</version>
  </dependency>

  <dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-mapreduce-client-core</artifactId>
  <version>2.7.3</version>
  </dependency>


org.apache.hadoop
hadoop通用
2.7.3
org.apache.hadoop
hadoop mapreduce客户端核心
2.7.3

我的窍门是在

FileReader

中使用

path.getName

，如果没有，我会得到

FileNotFoundException

，而不是DistributedCache.addCacheFile（新URI（“/user/training/listOfWords”）；试试这个DistributedCache.addCacheFile（新URI（“/user/training/listOfWords”）、job.getConfiguration（）；有些先生找不到文件

  Job job = Job.getInstance();
  job.setJarByClass(DistributedCacheExample.class);
  job.setJobName("Distributed cache example");
  job.addCacheFile(new Path("/user/cloudera/datasets/abc.dat").toUri());

  protected void setup(Context context) throws IOException, InterruptedException {
     URI[] files = context.getCacheFiles();
     for(URI file : files){
     if(file.getPath().contains("abc.dat")){
       Path path = new Path(file);
       BufferedReader reader = new BufferedReader(new FileReader(path.getName()));
       String line = reader.readLine();
       while(line != null){
         ......
       }
     }
  }

  <dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-common</artifactId>
  <version>2.7.3</version>
  </dependency>

  <dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-mapreduce-client-core</artifactId>
  <version>2.7.3</version>
  </dependency>