Java 无法在URI | | Getting NULLPointerException中的分布式缓存中加载文件
我正在尝试写一个map reduce作业,这是一个情绪分析,我使用AFINN.txt作为字典。在运行map reduce作业时,我将其放入HDFS中的一个文件中并尝试运行,但每次都失败Java 无法在URI | | Getting NULLPointerException中的分布式缓存中加载文件,java,hadoop,distributed-caching,Java,Hadoop,Distributed Caching,我正在尝试写一个map reduce作业,这是一个情绪分析,我使用AFINN.txt作为字典。在运行map reduce作业时,我将其放入HDFS中的一个文件中并尝试运行,但每次都失败 public class Sentiment_Analysis extends Configured implements Tool { public static class Map extends Mapper<LongWritable, Text, Text, Text> {
public class Sentiment_Analysis extends Configured implements Tool {
public static class Map extends Mapper<LongWritable, Text, Text, Text> {
private URI[] files;
private HashMap<String, String> AFINN_map = new HashMap<String, String>();
@Override
public void setup(Context context) throws IOException
{
files = DistributedCache.getCacheFiles(context.getConfiguration());
System.out.println("files:" + files);
Path path = new Path(files[0]); // here i am getting the Exception
FileSystem fs = FileSystem.get(context.getConfiguration());
FSDataInputStream in = fs.open(path);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String line = "";
while ((line = br.readLine()) != null)
{
String splits[] = line.split("\t");
AFINN_map.put(splits[0], splits[1]);
}
br.close();
in.close();
}
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String twt;
String line = value.toString();
String[] tuple = line.split("\\n");
JSONParser jsonParser = new JSONParser();
try {
for (int i = 0; i < tuple.length; i++) {
JSONObject obj = (JSONObject) jsonParser.parse(line);
String tweet_id = (String) obj.get("id_str");
String tweet_text = (String) obj.get("text");
twt = (String) obj.get("text");
String[] splits = twt.toString().split(" ");
int sentiment_sum = 0;
for (String word : splits) {
if (AFINN_map.containsKey(word))
{
Integer x = new Integer(AFINN_map.get(word));
sentiment_sum += x;
}
}
context.write(
new Text(tweet_id),
new Text(tweet_text + "\t----->\t"
+ new Text(Integer.toString(sentiment_sum))));
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
public static class Reduce extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Text value, Context context)
throws IOException, InterruptedException {
context.write(key, value);
}
}
public static void main(String[] args) throws Exception
{
ToolRunner.run(new Sentiment_Analysis(), args);
}
@Override
public int run(String[] args) throws Exception {
Configuration conf = new Configuration();
if (args.length != 2) {
System.err.println("Usage: Parse <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "SentimentAnalysis");
DistributedCache.addCacheFile(new URI("hdfs://localhost:50070//sentimentInput//AFINN.txt"), conf);
job.setJarByClass(Sentiment_Analysis.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
return 0;
}
}
但是我已经使用下面的命令将文件放在hdfs中
bin/hdfs dfs -ls /sentimentInput
18/05/17 12:25:46 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 1 jeet supergroup 28094 2018-05-17 11:43
/sentimentInput/AFINN.txt
-rw-r--r-- 1 jeet supergroup 13965969 2018-05-17 11:33
/sentimentInput/FlumeData.1440939532959
这表示文件存在,但当我触发作业时,它会显示以下错误
bin/yarn jar ../sentiment.jar com.jeet.sentiment.Sentiment_Analysis /sentimentInput /sentimentOutput5
Exception in thread "main" java.lang.IllegalArgumentException: Pathname /localhost:50070/sentimentInput/AFINN.txt from hdfs:/localhost:50070/sentimentInput/AFINN.txt is not a valid DFS filename.
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:195)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:104)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1089)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
有人能告诉我如何给出正确的文件路径,以便测试代码吗?您的URI缺少一个/: hdfs://localhost..... 编辑: 尝试对缓存文件使用更新的方法:
job.addCacheFile(uri);
content.getCachedFiles()
我不认为标签和标签在这里有用。您的问题是路径,只是碰巧尝试使用mapreduce执行情绪分析。mrbolichi正确,但因为我必须提出一个有用的问题并提供所有信息,所以我这样做了。您能帮忙吗?不,对不起,希望有人能帮您。我认为它不会添加信息来知道您正在处理的是什么情绪分析,因为您的问题并不特定于it感谢您的回答,但我得到了这个异常错误:java.lang.NullPointerException位于com.jeet.touction.touction\u analysis$Map.setup(touction\u analysis.java:72)我认为AFINN.txt仍然没有加载,因为我在@Override public void setup(上下文上下文)抛出IOException{files=DistributedCache.getCacheFiles(Context.getConfiguration());System.out.println(“files:+files”);Path Path=new Path(files[0])行中得到错误可以阅读
job.addCacheFile(uri);
content.getCachedFiles()