Java Hadoop分布式缓存抛出FileNotFound错误
我试图使用listOfWords文件只计算任何输入文件中的单词。将错误获取为FileNotFound,即使我已验证该文件在HDFS中的正确位置 内部驱动程序:Java Hadoop分布式缓存抛出FileNotFound错误,java,hadoop,mapreduce,distributed-caching,Java,Hadoop,Mapreduce,Distributed Caching,我试图使用listOfWords文件只计算任何输入文件中的单词。将错误获取为FileNotFound,即使我已验证该文件在HDFS中的正确位置 内部驱动程序: Configuration conf = new Configuration(); DistributedCache.addCacheFile(new URI("/user/training/listOfWords"), conf); Job job = new Job(conf,"CountEachWord Jo
Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("/user/training/listOfWords"), conf);
Job job = new Job(conf,"CountEachWord Job");
内部映射器:
private Path[] ref_file;
ArrayList<String> globalList = new ArrayList<String>();
public void setup(Context context) throws IOException{
this.ref_file = DistributedCache.getLocalCacheFiles(context.getConfiguration());
FileSystem fs = FileSystem.get(context.getConfiguration());
FSDataInputStream in_file = fs.open(ref_file[0]);
System.out.println("File opened");
BufferedReader br = new BufferedReader(new InputStreamReader(in_file));//each line of reference file
System.out.println("BufferReader invoked");
String eachLine = null;
while((eachLine = br.readLine()) != null)
{
System.out.println("eachLine is: "+ eachLine);
globalList.add(eachLine);
}
}
我已经验证了所提到的文件存在于HDFS中。我还尝试使用localRunner。还是不行 您可以尝试此操作来检索文件 URI[]files=DistributedCache.getCacheFiles(context.getConfiguration()) 您可以遍历文件。像这样试试 司机
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path cachefile = new Path("path/to/file");
FileStatus[] list = fs.globStatus(cachefile);
for (FileStatus status : list) {
DistributedCache.addCacheFile(status.getPath().toUri(), conf);
}
在映射器设置()中
在主要方法中,我使用这个
Job job = Job.getInstance();
job.setJarByClass(DistributedCacheExample.class);
job.setJobName("Distributed cache example");
job.addCacheFile(new Path("/user/cloudera/datasets/abc.dat").toUri());
然后在Mapper中,我使用了这个样板
protected void setup(Context context) throws IOException, InterruptedException {
URI[] files = context.getCacheFiles();
for(URI file : files){
if(file.getPath().contains("abc.dat")){
Path path = new Path(file);
BufferedReader reader = new BufferedReader(new FileReader(path.getName()));
String line = reader.readLine();
while(line != null){
......
}
}
}
我正在处理这些依赖项
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.7.3</version>
</dependency>
org.apache.hadoop
hadoop通用
2.7.3
org.apache.hadoop
hadoop mapreduce客户端核心
2.7.3
我的窍门是在
FileReader
中使用path.getName
,如果没有,我会得到FileNotFoundException
,而不是DistributedCache.addCacheFile(新URI(“/user/training/listOfWords”);试试这个DistributedCache.addCacheFile(新URI(“/user/training/listOfWords”)、job.getConfiguration();有些先生找不到文件
Job job = Job.getInstance();
job.setJarByClass(DistributedCacheExample.class);
job.setJobName("Distributed cache example");
job.addCacheFile(new Path("/user/cloudera/datasets/abc.dat").toUri());
protected void setup(Context context) throws IOException, InterruptedException {
URI[] files = context.getCacheFiles();
for(URI file : files){
if(file.getPath().contains("abc.dat")){
Path path = new Path(file);
BufferedReader reader = new BufferedReader(new FileReader(path.getName()));
String line = reader.readLine();
while(line != null){
......
}
}
}
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.7.3</version>
</dependency>