Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/308.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java mapreduce二级排序不';行不通_Java_Hadoop_Mapreduce_Secondary Sort - Fatal编程技术网

Java mapreduce二级排序不';行不通

Java mapreduce二级排序不';行不通,java,hadoop,mapreduce,secondary-sort,Java,Hadoop,Mapreduce,Secondary Sort,我正在尝试使用组合键在mapreduce中进行二次排序,该组合键包括: 字符串自然键=程序名 用于排序的长键=自1970年以来以毫秒为单位的时间 问题是,在排序之后,我根据整个组合键得到了大量的约简 通过调试,我已经验证了hashcode和compare函数是正确的。 在调试日志中,每个块都来自不同的减速器,这表明分组或分区没有成功。 从调试日志: 14/12/14 00:55:12 INFO popularitweet.EtanReducer: key=the voice 14/12/14

我正在尝试使用组合键在mapreduce中进行二次排序,该组合键包括:

  • 字符串自然键=程序名

  • 用于排序的长键=自1970年以来以毫秒为单位的时间

问题是,在排序之后,我根据整个组合键得到了大量的约简

通过调试,我已经验证了hashcode和compare函数是正确的。 在调试日志中,每个块都来自不同的减速器,这表明分组或分区没有成功。 从调试日志:

14/12/14 00:55:12 INFO popularitweet.EtanReducer: key=the voice
14/12/14 00:55:12 INFO popularitweet.EtanReducer: the voice: Thu Dec 11 17:51:03 +0000 2014
14/12/14 00:55:12 INFO popularitweet.EtanReducer: the voice: Thu Dec 11 17:51:03 +0000 2014
14/12/14 00:55:12 INFO popularitweet.EtanReducer: key the voice ended



14/12/14 00:55:12 INFO popularitweet.EtanReducer: key=top gear
14/12/14 00:55:12 INFO popularitweet.EtanReducer: top gear: Thu Dec 11 17:51:04 +0000 2014
14/12/14 00:55:12 INFO popularitweet.EtanReducer: key top gear ended



14/12/14 00:55:12 INFO popularitweet.EtanReducer: key=american horror story
14/12/14 00:55:12 INFO popularitweet.EtanReducer: american horror story: Thu Dec 11 17:51:04 +0000 2014
14/12/14 00:55:12 INFO popularitweet.EtanReducer: key american horror story ended



14/12/14 00:55:12 INFO popularitweet.EtanReducer: key=the voice
14/12/14 00:55:12 INFO popularitweet.EtanReducer: the voice: Thu Dec 11 17:51:04 +0000 2014
14/12/14 00:55:12 INFO popularitweet.EtanReducer: key the voice ended
正如您所见,语音被发送到两个不同的减缩器,但时间戳不同。 任何帮助都将不胜感激。 复合键为以下类别:

public class ProgramKey implements WritableComparable<ProgramKey> {
private String program;
private Long timestamp;

public ProgramKey() {
}

public ProgramKey(String program, Long timestamp) {
    this.program = program;
    this.timestamp = timestamp;
}

@Override
public int compareTo(ProgramKey o) {
    int result = program.compareTo(o.program);
    if (result == 0) {
        result = timestamp.compareTo(o.timestamp);
    }
    return result;
}

@Override
public void write(DataOutput dataOutput) throws IOException {
    WritableUtils.writeString(dataOutput, program);
    dataOutput.writeLong(timestamp);
}

@Override
public void readFields(DataInput dataInput) throws IOException {
    program = WritableUtils.readString(dataInput);
    timestamp = dataInput.readLong();
}
}

}

编辑

你的时间比较器似乎有一个输入错误。。。当ts2应设置为b时,您将其设置为a:

ProgramKey ts1 = (ProgramKey)a;
ProgramKey ts2 = (ProgramKey)a;
何时应该:

ProgramKey ts1 = (ProgramKey)a;
ProgramKey ts2 = (ProgramKey)b;
这将导致键/值对排序错误,并使分组比较器对键/值对进行排序的假设无效


还要检查原始程序名是否在UTF-8中,因为WritableUtils是这样假设的。系统的默认代码页也是UTF-8吗?

我浏览了GroupingComparator、Partitioner和SortComparator类,以及作业代码,它们都是正确的。为什么不试试下面的方法:设置一个较少的reducer,看看你得到了什么reduce键。另一个测试:打印出减速器内的减速器键,看看不同的复合键是否会连接到不同的减速器。我按照您的建议执行(job.setNumReduceTasks(10)),只打印键。我得到:14/12/14 09:12:51信息电子减速器:新减速器14/12/14 09:12:51信息电子减速器:键=x因子14/12/14 09:12:51信息电子减速器:x因子:1418320302000 14/12/14 09:12:51信息电子减速器:x因子:1418320302000 14/12/14 09:12:51信息电子减速器:键x因子结束14/12/14 09:12:12:51信息电子减速器:新减速器14/12/14 09:12:51INFO ETANREDUCTER:key=x因子14/12/14 09:12:51 INFO ETANREDUCTER:x因子:1418320302000 14/12/14 09:12:51 INFO ETANREDUCTER:key x因子简而言之,问题再次出现。也许我有这个问题,因为我在本地运行hadoop?或者我不应该通过调用“context.write(new ProgramKey(program.toString(),DateUtils.textToDate(timeStamp.getTime()),passedTweet)”来发出密钥,抱歉,我没有更多的帮助。我只是在想如何解决这个问题。也许你可以找到一些模式,对于这些模式,只有时间戳不同的键会被不同的减缩器使用——这是不应该发生的。哇,好眼睛。我浏览了整个代码,没有注意到它。
public class TimeStampComparator extends WritableComparator {
protected TimeStampComparator() {
    super(ProgramKey.class, true);
}

@Override
public int compare(WritableComparable a, WritableComparable b) {
    ProgramKey ts1 = (ProgramKey)a;
    ProgramKey ts2 = (ProgramKey)a;

    int result = ts1.getProgram().compareTo(ts2.getProgram());
    if (result == 0) {
        result = ts1.getTimestamp().compareTo(ts2.getTimestamp());
    }
    return result;
}
    public static void main(String[] args) throws IOException,
        InterruptedException, ClassNotFoundException {



    // Create configuration
    Configuration conf = new Configuration();

    // Create job
    Job job = new Job(conf, "test1");
    job.setJarByClass(EtanMapReduce.class);

    // Set partitioner keyComparator and groupComparator
    job.setPartitionerClass(ProgramKeyPartitioner.class);
    job.setGroupingComparatorClass(ProgramKeyGroupingComparator.class);
    job.setSortComparatorClass(TimeStampComparator.class);

    // Setup MapReduce
    job.setMapperClass(EtanMapper.class);
    job.setMapOutputKeyClass(ProgramKey.class);
    job.setMapOutputValueClass(TweetObject.class);
    job.setReducerClass(EtanReducer.class);

    // Specify key / value
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(TweetObject.class);

    // Input
    FileInputFormat.addInputPath(job, inputPath);
    job.setInputFormatClass(TextInputFormat.class);

    // Output
    FileOutputFormat.setOutputPath(job, outputDir);
    job.setOutputFormatClass(TextOutputFormat.class);

    // Delete output if exists
    FileSystem hdfs = FileSystem.get(conf);
    if (hdfs.exists(outputDir))
        hdfs.delete(outputDir, true);

    // Execute job
    logger.info("starting job");
    int code = job.waitForCompletion(true) ? 0 : 1;
    System.exit(code);

}    
ProgramKey ts1 = (ProgramKey)a;
ProgramKey ts2 = (ProgramKey)a;
ProgramKey ts1 = (ProgramKey)a;
ProgramKey ts2 = (ProgramKey)b;