Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用MAP Reduce JAVA解析平面Json文件_Java_Hadoop_Mapreduce - Fatal编程技术网

使用MAP Reduce JAVA解析平面Json文件

使用MAP Reduce JAVA解析平面Json文件,java,hadoop,mapreduce,Java,Hadoop,Mapreduce,我的任务是从HDFS解析Json对象,并在HDFS中写入单独的文件。下面是我的代码 package com.main; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable

我的任务是从HDFS解析Json对象,并在HDFS中写入单独的文件。下面是我的代码

package com.main;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.json.JSONException;
import org.json.JSONObject;

public class JsonMain {

    public static class Mapperclass extends Mapper<LongWritable, Text, Text, Text>{

        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{

            String regId;
            String time;
            String line = value.toString();
            String[] tuple = line.split("\\n");
            try{
                for(int i=0;i<tuple.length; i++){
                    JSONObject obj = new JSONObject(tuple[i]);
                    regId = obj.getString("regId");
                    time = obj.getString("time");
                    context.write(new Text(regId), new Text(time));
                }
            }catch(JSONException e){
                e.printStackTrace();
            }
        }
    }

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        // TODO Auto-generated method stub

        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "word count");

        job.setJarByClass(JsonMain.class);
        job.setMapperClass(Mapperclass.class);
        //job.setCombinerClass(IntSumReducer.class);        
        //job.setReducerClass(IntSumReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}
注意:我已经在我的项目中包含了所有依赖项Jar

执行以下命令: hadoop jar JsonMapper.jar com.main.JsonMain/user/cloudera/FlatJson/FlatJson.txt output007

以下是我收到的错误消息

17/11/01 08:11:12 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1509542757670_0003/
17/11/01 08:11:12 INFO mapreduce.Job: Running job: job_1509542757670_0003
17/11/01 08:13:33 INFO mapreduce.Job: Job job_1509542757670_0003 running in uber mode : false
17/11/01 08:13:33 INFO mapreduce.Job:  map 0% reduce 0%
17/11/01 08:15:32 INFO mapreduce.Job: Task Id : attempt_1509542757670_0003_m_000000_0, Status : FAILED

Error: java.lang.ClassNotFoundException: org.json.JSONException
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
    at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2138)

“java.lang.ClassNotFoundException:org.json.jsoneException”==>我已经在我的项目中导入了这个jar。让我知道这有什么问题。

让我们开始分步调试您的问题


  • 请执行一个jar-tvfjsonmapper.jar | grepjsonexception,您将看到这个类不存在于您的jar中
  • 请务必理解,通过依赖关系管理系统(如mvn)在项目中包含依赖关系并不能保证其在jar中的可用性
  • 请使用着色插件将依赖项中的所有jar包含到着色胖jar中
  • “错误:java.lang.ClassNotFoundException:org.json.jsoneException”-->此问题已解决

    以前我把jar放在/home/jar/javajson.jar路径中

    我已经将这个jar移动到了“/usr/lib/hadoop-mapreduce/”这个路径,包括了这个jar,并将这个jar添加到它工作的项目中


    cp java-json.jar/usr/lib/hadoop-mapreduce

    值得指出的是,ApacheSpark或Drill可以在不到5行代码中解析此文件code@cricket_007=>上述问题已解决。现在MR作业已成功完成,但输出文件被创建为空文件。然后,从未命中
    context.write
    ,并且您的try-catch总是
    catch
    ed
    line.split(\\n”)没有意义,因为默认情况下MapReduce总是一次读取一行,因此for循环是pointless@cricket_007谢谢!如何在Map reduce中读取JSON对象值而无需循环。请指导我。例如:{“regId”:“TbEtvRH”“time”:1509073895112}我需要值“TbEtvRH”和“1509073895112”.jar-tvf JsonMapper.jar | grep JSONException=>781 11月1日星期三10:50:28 PDT 2017 com/amazonaws/util/JSONException.class 807 11月1日星期三10:50:32 PDT 2017 com/cloudera/com/amazonaws/util/json/JSONException.class 587 11月1日星期三10:50:48 PDT 2017 org/json/JSONException.class 700 11月1日星期三10:50:48 PDT 2017org/json/JSONException.java
    
    17/11/01 08:11:12 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1509542757670_0003/
    17/11/01 08:11:12 INFO mapreduce.Job: Running job: job_1509542757670_0003
    17/11/01 08:13:33 INFO mapreduce.Job: Job job_1509542757670_0003 running in uber mode : false
    17/11/01 08:13:33 INFO mapreduce.Job:  map 0% reduce 0%
    17/11/01 08:15:32 INFO mapreduce.Job: Task Id : attempt_1509542757670_0003_m_000000_0, Status : FAILED
    
    Error: java.lang.ClassNotFoundException: org.json.JSONException
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:270)
        at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2138)