Java 无法在URI | | Getting NULLPointerException中的分布式缓存中加载文件_Java_Hadoop_Distributed Caching

Java 无法在URI | | Getting NULLPointerException中的分布式缓存中加载文件

java hadoop

Java 无法在URI | | Getting NULLPointerException中的分布式缓存中加载文件,java,hadoop,distributed-caching,Java,Hadoop,Distributed Caching,我正在尝试写一个map reduce作业，这是一个情绪分析，我使用AFINN.txt作为字典。在运行map reduce作业时，我将其放入HDFS中的一个文件中并尝试运行，但每次都失败 public class Sentiment_Analysis extends Configured implements Tool { public static class Map extends Mapper<LongWritable, Text, Text, Text> {

我正在尝试写一个map reduce作业，这是一个情绪分析，我使用AFINN.txt作为字典。在运行map reduce作业时，我将其放入HDFS中的一个文件中并尝试运行，但每次都失败

    public class Sentiment_Analysis extends Configured implements Tool {

public static class Map extends Mapper<LongWritable, Text, Text, Text> {

    private URI[] files;

    private HashMap<String, String> AFINN_map = new HashMap<String, String>();

    @Override
    public void setup(Context context) throws IOException

    {

        files = DistributedCache.getCacheFiles(context.getConfiguration());

        System.out.println("files:" + files);

        Path path = new Path(files[0]); // here i am getting the Exception

        FileSystem fs = FileSystem.get(context.getConfiguration());

        FSDataInputStream in = fs.open(path);

        BufferedReader br = new BufferedReader(new InputStreamReader(in));

        String line = "";

        while ((line = br.readLine()) != null)

        {

            String splits[] = line.split("\t");

            AFINN_map.put(splits[0], splits[1]);

        }

        br.close();

        in.close();

    }

    public void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {

        String twt;
        String line = value.toString();
        String[] tuple = line.split("\\n");
        JSONParser jsonParser = new JSONParser();

        try {

            for (int i = 0; i < tuple.length; i++) {

                JSONObject obj = (JSONObject) jsonParser.parse(line);

                String tweet_id = (String) obj.get("id_str");

                String tweet_text = (String) obj.get("text");
                twt = (String) obj.get("text");
                String[] splits = twt.toString().split(" ");

                int sentiment_sum = 0;

                for (String word : splits) {

                    if (AFINN_map.containsKey(word))

                    {

                        Integer x = new Integer(AFINN_map.get(word));

                        sentiment_sum += x;

                    }

                }

                context.write(
                        new Text(tweet_id),
                        new Text(tweet_text + "\t----->\t"
                                + new Text(Integer.toString(sentiment_sum))));

            }

        } catch (Exception e) {

            e.printStackTrace();

        }

    }

}

public static class Reduce extends Reducer<Text, Text, Text, Text> {

    public void reduce(Text key, Text value, Context context)
            throws IOException, InterruptedException {

        context.write(key, value);

    }

}

public static void main(String[] args) throws Exception

{

    ToolRunner.run(new Sentiment_Analysis(), args);

}

@Override
public int run(String[] args) throws Exception {

    Configuration conf = new Configuration();

    if (args.length != 2) {
        System.err.println("Usage: Parse <in> <out>");
        System.exit(2);
    }

    Job job = new Job(conf, "SentimentAnalysis");
    DistributedCache.addCacheFile(new URI("hdfs://localhost:50070//sentimentInput//AFINN.txt"), conf);
    job.setJarByClass(Sentiment_Analysis.class);

    job.setMapperClass(Map.class);

    job.setReducerClass(Reduce.class);

    job.setMapOutputKeyClass(Text.class);

    job.setMapOutputValueClass(Text.class);

    job.setOutputKeyClass(NullWritable.class);

    job.setOutputValueClass(Text.class);

    job.setInputFormatClass(TextInputFormat.class);

    job.setOutputFormatClass(TextOutputFormat.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));

    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    System.exit(job.waitForCompletion(true) ? 0 : 1);

    return 0;

}

}

但是我已经使用下面的命令将文件放在hdfs中

  bin/hdfs dfs -ls /sentimentInput
  18/05/17 12:25:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
  library for your platform... using builtin-java classes where applicable
  Found 2 items
  -rw-r--r--   1 jeet supergroup      28094 2018-05-17 11:43 
 /sentimentInput/AFINN.txt
 -rw-r--r--   1 jeet supergroup   13965969 2018-05-17 11:33 
/sentimentInput/FlumeData.1440939532959

这表示文件存在，但当我触发作业时，它会显示以下错误

bin/yarn jar ../sentiment.jar com.jeet.sentiment.Sentiment_Analysis /sentimentInput /sentimentOutput5


Exception in thread "main" java.lang.IllegalArgumentException: Pathname /localhost:50070/sentimentInput/AFINN.txt from hdfs:/localhost:50070/sentimentInput/AFINN.txt is not a valid DFS filename.
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:195)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:104)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1089)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)

有人能告诉我如何给出正确的文件路径，以便测试代码吗？

您的URI缺少一个/：

hdfs://localhost.....

编辑：

尝试对缓存文件使用更新的方法：

job.addCacheFile(uri);

content.getCachedFiles()

我不认为标签和标签在这里有用。您的问题是路径，只是碰巧尝试使用mapreduce执行情绪分析。mrbolichi正确，但因为我必须提出一个有用的问题并提供所有信息，所以我这样做了。您能帮忙吗？不，对不起，希望有人能帮您。我认为它不会添加信息来知道您正在处理的是什么情绪分析，因为您的问题并不特定于it感谢您的回答，但我得到了这个异常错误：java.lang.NullPointerException位于com.jeet.touction.touction\u analysis$Map.setup（touction\u analysis.java:72）我认为AFINN.txt仍然没有加载，因为我在@Override public void setup（上下文上下文）抛出IOException{files=DistributedCache.getCacheFiles（Context.getConfiguration（））；System.out.println（“files:+files”）；Path Path=new Path（files[0]）行中得到错误可以阅读

job.addCacheFile(uri);

content.getCachedFiles()