Hadoop 分区器似乎无法在单个节点上工作？_Hadoop_Mapreduce_Hdfs_Reduce_Bigdata

Hadoop 分区器似乎无法在单个节点上工作？

hadoop mapreduce

Hadoop 分区器似乎无法在单个节点上工作？,hadoop,mapreduce,hdfs,reduce,bigdata,Hadoop,Mapreduce,Hdfs,Reduce,Bigdata,我已经编写了map reduce代码以及自定义分区。自定义分区使用某些条件对键进行排序。我在驱动程序类中设置了setNumReduceTasks=6。但是我正在我的一台机器上测试这段代码，我只得到一个reducer输出文件，而不是6个reducer文件。分区器在单机上不工作吗？是否需要多节点集群来查看自定义分区器的效果？对此有任何见解都将不胜感激。我在一台机器中拥有一个双节点集群。这是。从那里你可以看到我这样做（在执行时）：指定减速器的数量，例如两个当您将reducer的no设置为大于

我已经编写了map reduce代码以及自定义分区。自定义分区使用某些条件对键进行排序。我在驱动程序类中设置了setNumReduceTasks=6。但是我正在我的一台机器上测试这段代码，我只得到一个reducer输出文件，而不是6个reducer文件。分区器在单机上不工作吗？是否需要多节点集群来查看自定义分区器的效果？

对此有任何见解都将不胜感激。

我在一台机器中拥有一个双节点集群。这是。从那里你可以看到我这样做（在执行时）：

指定减速器的数量，例如两个

当您将reducer的no设置为大于1时，即使它是一个单节点集群，Partitioner也始终有效

我已经在单节点集群上测试了以下代码，并按预期工作：

public final class SortMapReduce extends Configured implements Tool {

public static void main(final String[] args) throws Exception {
    int res = ToolRunner.run(new Configuration(), new SortMapReduce(), args);
    System.exit(res);
}

public int run(final String[] args) throws Exception {

    Path inputPath = new Path(args[0]);
    Path outputPath = new Path(args[1]);

    Configuration conf = super.getConf();

    Job job = Job.getInstance(conf);

    job.setJarByClass(SortMapReduce.class);
    job.setMapperClass(Map.class);
    job.setReducerClass(Reduce.class);

    job.setInputFormatClass(KeyValueTextInputFormat.class);

    job.setMapOutputKeyClass(Person.class);
    job.setMapOutputValueClass(Text.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    job.setPartitionerClass(PersonNamePartitioner.class);

    job.setNumReduceTasks(5);

    FileInputFormat.setInputPaths(job, inputPath);
    FileOutputFormat.setOutputPath(job, outputPath);

    if (job.waitForCompletion(true)) {
        return 0;
    }
    return 1;
}

public static class Map extends Mapper<Text, Text, Person, Text> {

    private Person outputKey = new Person();

    @Override
    protected void map(Text pointID, Text firstName, Context context) throws IOException, InterruptedException {
        outputKey.set(pointID.toString(), firstName.toString());
        context.write(outputKey, firstName);
    }
}

public static class Reduce extends Reducer<Person, Text, Text, Text> {

    Text pointID = new Text();

    @Override
    public void reduce(Person key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        pointID.set(key.getpointID());
        for (Text firstName : values) {
            context.write(pointID, firstName);
        }
    }
}

public final类SortMapReduce扩展配置的实现工具{
公共静态void main（最终字符串[]args）引发异常{
int res=ToolRunner.run（新配置（），新SortMapReduce（），args）；
系统退出（res）；
}
公共int运行（最终字符串[]args）引发异常{
路径输入路径=新路径（args[0]）；
路径outputPath=新路径（args[1]）；
配置conf=super.getConf（）；
Job Job=Job.getInstance（conf）；
job.setJarByClass（SortMapReduce.class）；
job.setMapperClass（Map.class）；
job.setReducerClass（Reduce.class）；
作业.setInputFormatClass（KeyValueTextInputFormat.class）；
job.setMapOutputKeyClass（Person.class）；
job.setMapOutputValueClass（Text.class）；
job.setOutputKeyClass（Text.class）；
job.setOutputValueClass（Text.class）；
job.setPartitionerClass（PersonNamePartitioner.class）；
job.setNumReduceTasks（5）；
setInputPath（作业，inputPath）；
setOutputPath（作业，outputPath）；
if（作业等待完成（true））{
返回0；
}
返回1；
}
公共静态类映射扩展映射器{
private Person outputKey=new Person（）；
@凌驾
受保护的void映射（Text pointID、Text firstName、Context Context）引发IOException、InterruptedException{
set（pointID.toString（），firstName.toString（））；
write（outputKey，firstName）；
}
}
公共静态类Reduce扩展Reducer{
Text pointID=新文本（）；
@凌驾
公共void reduce（Person键、Iterable值、上下文上下文）抛出IOException、interruptedeexception{
set（key.getpointID（））；
对于（文本名：值）{
write（pointID，firstName）；
}
}
}

}

分区器类：

public class PersonNamePartitioner extends Partitioner<Person, Text> {

@Override
public int getPartition(Person key, Text value, int numPartitions) {

    return Math.abs(key.getpointID().hashCode() * 127) % numPartitions;
}

公共类PersonNamePartitioner扩展了Partitioner{
@凌驾
公共int getPartition（个人密钥、文本值、int numPartitions）{
返回Math.abs（key.getpointID（）.hashCode（）*127）%numPartitions；
}

}

运行命令：

hadoop jar/home/hdfs/SecondarySort.jar org.test.SortMapReduce/demo/data/Customer/acct.txt/demo/data/Customer/output2

谢谢，

仔细查看您的自定义分区器。对于传递给它的所有键，它可能返回相同的分区值

在这种情况下，它是一个低效的分区程序，它将所有键发送到同一个减速器。因此，即使将减速机的数量设置为6，也只有一个减速机具有所有键值，其余5个减速机将没有任何要处理的内容

因此，您将拥有处理所有记录的唯一减速机的输出

分区器是否不适用于单个分区机器？ 分区器也可以在单机伪集群中工作

是否需要多节点集群来查看定制的效果分区器？

不。

@gsmaras谢谢你的回答。但我只是在netbeans中测试我的代码。之后我将绑定一个jar，并在多节点上执行它。分区器将在单机上从IDE工作。欢迎您。我也使用过Eclipse，但我不能帮助使用Netbeans和IDE，可能有一些内部因素会影响这种情况。无论如何，好问题，我会投票的。

public class PersonNamePartitioner extends Partitioner<Person, Text> {

@Override
public int getPartition(Person key, Text value, int numPartitions) {

    return Math.abs(key.getpointID().hashCode() * 127) % numPartitions;
}