Java 用多线程映射器替换映射器时,映射中的键类型不匹配

Java 用多线程映射器替换映射器时,映射中的键类型不匹配,java,hadoop,mapreduce,Java,Hadoop,Mapreduce,我想为我的MapReduce工作实现一个多线程映射器 为此,我在工作代码中将映射器替换为多线程映射器 以下是我得到的解释: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.LongWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffe

我想为我的MapReduce工作实现一个多线程映射器

为此,我在工作代码中将映射器替换为多线程映射器

以下是我得到的解释:

java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:862)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:549)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$SubMapRecordWriter.write(MultithreadedMapper.java:211)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:264)
以下是代码设置:

 public static void main(String[] args) {
    try {
        if (args.length != 2) {
            System.err.println("Usage: MapReduceMain <input path> <output path>");
            System.exit(123);
        }
        Job job = new Job();
        job.setJarByClass(MapReduceMain.class);
        job.setInputFormatClass(TextInputFormat.class);
        FileSystem fs = FileSystem.get(URI.create(args[0]), job.getConfiguration());
        FileStatus[] files = fs.listStatus(new Path(args[0]));
        for(FileStatus sfs:files){
            FileInputFormat.addInputPath(job, sfs.getPath());
        }
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.setMapperClass(MyMultithreadMapper.class);
        job.setReducerClass(MyReducer.class);
        MultithreadedMapper.setNumberOfThreads(job, MyMultithreadMapper.nThreads);

        job.setOutputKeyClass(IntWritable.class); 
        job.setOutputValueClass(MyPage.class);

        job.setOutputFormatClass(SequenceFileOutputFormat.class);//write the result as sequential file

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    } catch (Exception e) {
        e.printStackTrace();
    }
}
publicstaticvoidmain(字符串[]args){
试一试{
如果(参数长度!=2){
System.err.println(“用法:MapReduceMain”);
系统出口(123);
}
作业=新作业();
job.setJarByClass(MapReduceMain.class);
setInputFormatClass(TextInputFormat.class);
FileSystem fs=FileSystem.get(URI.create(args[0]),job.getConfiguration();
FileStatus[]files=fs.listStatus(新路径(args[0]);
for(FileStatus sfs:files){
addInputPath(作业,sfs.getPath());
}
setOutputPath(作业,新路径(args[1]);
setMapperClass(MyMultithreadMapper.class);
job.setReducerClass(MyReducer.class);
MultithreadedMapper.setNumberOfThreads(作业,MyMultithreadMapper.nThreads);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(MyPage.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);//将结果写入顺序文件
系统退出(作业等待完成(真)?0:1;
}捕获(例外e){
e、 printStackTrace();
}
}
这是地图绘制者的代码:

public class MyMultithreadMapper extends MultithreadedMapper<LongWritable, Text, IntWritable, MyPage> {

ConcurrentLinkedQueue<MyScraper>    scrapers    = new ConcurrentLinkedQueue<MyScraper>();

public static final int             nThreads    = 5;

public MyMultithreadMapper() {
    for (int i = 0; i < nThreads; i++) {
        scrapers.add(new MyScraper());
    }
}

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    MyScraper scraper = scrapers.poll();

    MyPage result = null;
    for (int i = 0; i < 10; i++) {
        try {
            result = scraper.scrapPage(value.toString(), true);
            break;
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    if (result == null) {
        result = new MyPage();
        result.setUrl(key.toString());
    }

    context.write(new IntWritable(result.getUrl().hashCode()), result);

    scrapers.add(scraper);
}
公共类MyMultithreadMapper扩展了多线程Mapper{
ConcurrentLinkedQueue scrapers=新的ConcurrentLinkedQueue();
公共静态最终读数=5;
公共MyMultithreadMapper(){
对于(int i=0;i

我他妈的为什么会这样?

以下是必须要做的:

多线程apper.setMapperClass(作业,MyMapper.class)

MyMapper必须实现映射逻辑

多线程映射器必须为空