Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop TotalOrderPartitioner提供错误的键类错误_Hadoop_Hadoop Partitioning - Fatal编程技术网

Hadoop TotalOrderPartitioner提供错误的键类错误

Hadoop TotalOrderPartitioner提供错误的键类错误,hadoop,hadoop-partitioning,Hadoop,Hadoop Partitioning,我正在尝试TotalOrderPartitioner hadoop。这样做时,我得到以下错误。错误声明-“错误的密钥类” 司机代码- import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce

我正在尝试TotalOrderPartitioner hadoop。这样做时,我得到以下错误。错误声明-“错误的密钥类”

司机代码-

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.partition.InputSampler;
import org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner;


public class WordCountJobTotalSort {

    public static void main (String args[]) throws Exception
    {
        if (args.length < 2 ) 
        {
            System.out.println("Plz provide I/p and O/p directory ");
            System.exit(-1);
        }

        Job job = new Job();

        job.setJarByClass(WordCountJobTotalSort.class);
        job.setJobName("WordCountJobTotalSort");            
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        job.setInputFormatClass(SequenceFileInputFormat.class);
        job.setMapperClass(WordMapper.class);
        job.setPartitionerClass(TotalOrderPartitioner.class);
        job.setReducerClass(WordReducer.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        job.setNumReduceTasks(2);

        TotalOrderPartitioner.setPartitionFile(job.getConfiguration(), new Path("/tmp/partition.lst"));

        InputSampler.writePartitionFile(job, new InputSampler.RandomSampler<IntWritable, Text>(1,2,2));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}
输入文件-

[cloudera@localhost工作空间]$hadoop fs-文本文件\u seq/part-m-00000

0你好你好

12如何

20是

26你的


36作业作业

注释这两行并执行hadoop作业

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
好的,如果它不起作用,那么在注释这两行之后,您必须设置 输入和输出格式类

job.setInputFormatClass(SequenceFileInputFormat.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);

在我的例子中,我得到了相同的错误键类错误,这是因为我使用了带有自定义可写的combiner。当我评论combiner时,它工作得很好。

InputSampler在映射阶段执行采样(在随机播放和减少之前),采样通过映射器的输入键完成。我们需要确保映射器的输入和输出键是相同的;否则,MR框架将找不到合适的bucket将输出键、值对放入采样空间

在这种情况下,输入键是可长写的,因此InputSampler将基于所有可长写键的子集创建一个分区。但是输出键是Text,因此MR框架将无法从分区中找到合适的bucket


我们可以通过引入准备阶段来解决这个问题

这行代码说明了一切:错误的密钥类:org.apache.hadoop.io.LongWritable不是org.apache.hadoop.io.Text类。检查发生此错误的行号,并更改类型。从错误日志中,我无法找到导致此错误的确切行号。在我看来,我没有正确使用hadoop java包,但我不确定。。。对hadoop的了解非常有限。然后看看你自己的代码和上面提到的代码行。在WordCountJobTotalSort.main(WordCountJobTotalSort.java:47)Hi-Krunal。。。在注释了这两行之后,我仍然得到了相同的错误:(嗨,Krunal…我已经在代码中使用了job.setInputFormatClass(SequenceFileInputFormat.class);现在我还添加了job.setOutputFormatClass(SequenceFileOutputFormat.class);但是代码中仍然出现了相同的错误
[cloudera@localhost workspace]$ hadoop jar WordCountJobTotalSort.jar WordCountJobTotalSort file_seq/part-m-00000 file_out
15/05/18 00:45:13 INFO input.FileInputFormat: Total input paths to process : 1
15/05/18 00:45:13 INFO partition.InputSampler: Using 2 samples
15/05/18 00:45:13 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
15/05/18 00:45:13 INFO compress.CodecPool: Got brand-new compressor [.deflate]
Exception in thread "main" java.io.IOException: wrong key class: org.apache.hadoop.io.LongWritable is not class org.apache.hadoop.io.Text
    at org.apache.hadoop.io.SequenceFile$RecordCompressWriter.append(SequenceFile.java:1340)
    at org.apache.hadoop.mapreduce.lib.partition.InputSampler.writePartitionFile(InputSampler.java:336)
    at WordCountJobTotalSort.main(WordCountJobTotalSort.java:47)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setInputFormatClass(SequenceFileInputFormat.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);