Hadoop 分区程序工作不正常_Hadoop_Mapreduce_Partitioner

Hadoop 分区程序工作不正常

hadoop mapreduce

Hadoop 分区程序工作不正常,hadoop,mapreduce,partitioner,Hadoop,Mapreduce,Partitioner,我试图编写一个MapReduce场景，其中我以JSON的形式创建了一些用户点击流数据。之后，我编写了Mapper类，以从文件中获取所需数据。我的Mapper代码是：- private final static String URL = "u"; private final static String Country_Code = "c"; private final static String Known_User = "nk"; private final static String S

我试图编写一个MapReduce场景，其中我以JSON的形式创建了一些用户点击流数据。之后，我编写了Mapper类，以从文件中获取所需数据。我的Mapper代码是：-

private final static String URL = "u";

private final static String Country_Code = "c";

private final static String Known_User = "nk";

private final static String Session_Start_time = "hc";

private final static String User_Id = "user";

private final static String Event_Id = "event";

public void map(LongWritable key, Text value, Context context)
        throws IOException, InterruptedException {
    String aJSONRecord = value.toString();
    try {
        JSONObject aJSONObject = new JSONObject(aJSONRecord);
        StringBuilder aOutputString = new StringBuilder();
        aOutputString.append(aJSONObject.get(User_Id).toString()+",");
        aOutputString.append(aJSONObject.get(Event_Id).toString()+",");
        aOutputString.append(aJSONObject.get(URL).toString()+",");
        aOutputString.append(aJSONObject.get(Known_User)+",");
        aOutputString.append(aJSONObject.get(Session_Start_time)+",");
        aOutputString.append(aJSONObject.get(Country_Code)+",");
        context.write(new Text(aOutputString.toString()), key);
        System.out.println(aOutputString.toString());
    } catch (JSONException e) {
        e.printStackTrace();
    }
}

public int getPartition(Text key, LongWritable value, int numPartitions) {
    String aRecord = key.toString();
    if(aRecord.contains(Country_code_Us)){
        return 0;
    }else{
        return 1;
    }
}

}

我的代码是：-

public void reduce(Text key, Iterable<LongWritable> values,
        Context context) throws IOException, InterruptedException {
        String aString =  key.toString();
        context.write(new Text(aString.trim()), new Text(""));  

}

这是我的司机代码

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "Click Stream Analyzer");
    job.setNumReduceTasks(2);
    job.setJarByClass(ClickStreamDriver.class);
    job.setMapperClass(ClickStreamMapper.class);
    job.setReducerClass(ClickStreamReducer.class);
    job.setPartitionerClass(ClickStreamPartitioner.class);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(LongWritable.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);

}

在这里，我试图根据国家代码对数据进行分区。但它不起作用，它在一个单独的reducer文件中发送每一条记录，我认为是另一个文件，而不是为我们创建的reduce文件

还有一件事，当我看到映射器的输出时，它显示在每条记录的末尾添加了一些额外的空间

如果我在这里犯了任何错误，请提出建议。

您的分区问题是由于还原器的数量造成的。如果它是1，您的所有数据将被发送到它，独立地发送到您从分区程序返回的内容。因此，将

mapred.reduce.tasks

设置为2将解决此问题。或者你可以简单地写：

job.setNumReduceTasks(2);

为了有两个您想要的减速器。

除非您有非常具体的要求，否则您可以为作业参数设置减速器，如下所示

mapred.reduce.tasks (in 1.x) & mapreduce.job.reduces(2.x)

或

job.setnumreducetask（2）

根据mark91答案

但是，通过使用下面的API，将工作交给Hadoop fraemork。框架将根据文件和块大小决定还原器的数量

job.setPartitionerClass(HashPartitioner.class);

我使用了nullwriteable，它可以工作。现在我可以看到记录被划分到不同的文件中。由于我使用longwritable作为空值而不是空可写值，因此在每行的最后添加了空格，因此US被列为“US”，分区无法划分订单。

什么是

Country\u code\u US

？Country\u code\u US=“US”；您确定您的输入数据中有

US

吗？这有点不相关，但我不确定为什么要从映射器输出

LongWritable

。输出

NullWritable.get（）

，并将值输出格式设置为

NulWritable.class

。对减速器值执行相同的操作。对每个键执行

新文本（“”

）将是一个巨大的资源消耗！是的，我的记录中有我们。你能在

getPartition

方法中添加一个sysout来确认它是否真的命中了吗？OP没有提到他们的驱动程序，所以我们不知道这是问题所在。很抱歉，现在我已经更新了这个问题，让它也有驱动程序代码。