Java 为什么在Mapper类中没有使用LongWritable（键）？_Java_Hadoop_Mapreduce

Java 为什么在Mapper类中没有使用LongWritable（键）？

java hadoop mapreduce

Java 为什么在Mapper类中没有使用LongWritable（键）？,java,hadoop,mapreduce,Java,Hadoop,Mapreduce,映射器： Mapper类是一个泛型类型，有四个形式类型参数，用于指定map函数的输入键、输入值、输出键和输出值类型 public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private static final int MISSING = 9999; @Override public void map(L

映射器：

Mapper类是一个泛型类型，有四个形式类型参数，用于指定map函数的输入键、输入值、输出键和输出值类型

public class MaxTemperatureMapper
    extends Mapper<LongWritable, Text, Text, IntWritable> {
        private static final int MISSING = 9999;
        @Override
        public void map(LongWritable key, Text value, Context context)
          throws IOException, InterruptedException {
            String line = value.toString();
            String year = line.substring(15, 19);
            int airTemperature;
            if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs
                airTemperature = Integer.parseInt(line.substring(88, 92));
            } else {
                airTemperature = Integer.parseInt(line.substring(87, 92));
        }
        String quality = line.substring(92, 93);
        if (airTemperature != MISSING && quality.matches("[01459]")) {
            context.write(new Text(year), new IntWritable(airTemperature));
        }
    }

公共类MaxTemperatureMapper
扩展映射器{
缺少专用静态最终整数=9999；
@凌驾
公共void映射（可长写键、文本值、上下文）
抛出IOException、InterruptedException{
字符串行=value.toString（）；
字符串年份=行子字符串（15,19）；
室内空气温度；
if（line.charAt（87）='+'）{//parseInt不喜欢前导加号
airTemperature=Integer.parseInt（第行子字符串（88,92））；
}否则{
airTemperature=整数.parseInt（行.子字符串（87,92））；
}
字符串质量=行。子字符串（92，93）；
if（气温！=缺失和质量匹配（“[01459]”）{
写（新文本（年份），新可写（气温））；
}
}

减速器：

四个形式类型参数用于指定输入和输出类型，如下所示 reduce函数的时间。reduce函数的输入类型必须与map函数的输出类型匹配：Text和intwriteable

public class MaxTemperatureReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
    public void reduce(Text key, Iterable<IntWritable> values, Context context)
    throws IOException, InterruptedException {
        int maxValue = Integer.MIN_VALUE;
        for (IntWritable value : values) {
            maxValue = Math.max(maxValue, value.get());
        }
    context.write(key, new IntWritable(maxValue));
    }
}

公共类MaxTemperatureReducer
伸缩减速机{
@凌驾
公共void reduce（文本键、Iterable值、上下文）
抛出IOException、InterruptedException{
int maxValue=整数.MIN_值；
for（可写入值：值）{
maxValue=Math.max（maxValue，value.get（））；
}
write（key，newintwriteable（maxValue））；
}
}

但在本例中，从未使用过密钥

完全没有使用过的键入映射器有什么用？

为什么键是可长写的？

本例中使用的输入格式是将键/值对生成为

LongWritable/Text

这里的键

LongWritable

表示从给定输入文件的

输入拆分

读取的当前行的偏移位置。其中

文本

表示实际的当前行本身

我们不能说由

LongWritable

键为文件中的每一行提供的行偏移量值没有用处。它取决于用例，根据您的情况，此输入键不重要

其中，我们有许多类型的

InputFormat

类型，而不是

TextInputFormat

，它以不同的方式解析输入文件中的行，并生成相关的键/值对

例如，是

TextInputFormat

的子类，它使用configures

delimiter

解析每一行，并将键/值生成为

Text/Text

编辑：- 下面列出了几种输入格式和键/值类型

KeyValueTextInputFormat  Text/Text

NLineInputFormat         LongWritable/Text

FixedLengthInputFormat   LongWritable/BytesWritable

除此之外，我们有一些输入格式在声明时采用基于泛型的自定义键/值类型。例如

SequenceFileInputFormat，CombineFileInputFormat

。请查看Hadoop权威指南中的输入格式一章

希望这有帮助。

JobConf如果您没有设置

job.setMapOutputValueClass(...)

内部JobConf代码：-

public Class<?> getOutputKeyClass() {
    return getClass(JobContext.OUTPUT_KEY_CLASS,
                    LongWritable.class, Object.class);
}

public类getOutputKeyClass（）{
返回getClass（JobContext.OUTPUT\u KEY\u类，
LongWritable.class、Object.class）；
}

您能提供一些不同于LongWritable的键的示例吗？如果您想使用combinefileinputformat，您显然会使用文件名和偏移量作为键（以跟踪从哪个文件读取值），因此，这取决于您创建自己的inputformat时，如果您发现某个键比正常偏移量有用，则可以使用它。