Hadoop 我可以使用文本作为值在mapreduce中写入上下文吗_Hadoop_Mapreduce_Hadoop2

Hadoop 我可以使用文本作为值在mapreduce中写入上下文吗

hadoop mapreduce

Hadoop 我可以使用文本作为值在mapreduce中写入上下文吗,hadoop,mapreduce,hadoop2,Hadoop,Mapreduce,Hadoop2,我有一个场景来计算MapReduce中两列的平均值。所以我所做的是，我使用mapper从文件中获取值，并将它们连接为文本，然后尝试将它们写入上下文，如下所示类TestMapper扩展了映射器{ 私有文本输出密钥；私有文本输出； @凌驾公共void映射（LongWritable键、文本值、上下文上下文）引发IOException、InterruptedException{ //这里有更多代码 write（outputkey，OutputVal）； } }您应该在此处使用自定义数据类型，例如

我有一个场景来计算MapReduce中两列的平均值。所以我所做的是，我使用mapper从文件中获取值，并将它们连接为文本，然后尝试将它们写入上下文，如下所示

类TestMapper扩展了映射器{
私有文本输出密钥；
私有文本输出；
@凌驾
公共void映射（LongWritable键、文本值、上下文上下文）引发IOException、InterruptedException{
//这里有更多代码
write（outputkey，OutputVal）；
}
}

您应该在此处使用自定义数据类型，例如具有两个文本元素的TextPair类来存储所需的数据。下面是一个示例代码，用于在映射器上下文的值中输出一对字符串

// Mapper's map code
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, TextPair>.Context context)
        throws IOException, InterruptedException {

    String line = value.toString();
    String year = line.substring(15, 19);
    int airTemperature;
    if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs
      airTemperature = Integer.parseInt(line.substring(88, 92));
    } else {
      airTemperature = Integer.parseInt(line.substring(87, 92));
    }
    String quality = line.substring(92, 93);
    if (airTemperature != MISSING && quality.matches("[01459]")) {
        System.out.println("Year "+year+" "+airTemperature);
      context.write(new Text(year), new TextPair(String.valueOf(airTemperature),1));
    }

//映射程序的映射代码
受保护的void映射（LongWritable键、文本值、Mapper.Context）
抛出IOException、InterruptedException{
字符串行=value.toString（）；
字符串年份=行子字符串（15,19）；
室内空气温度；
if（line.charAt（87）='+'）{//parseInt不喜欢前导加号
airTemperature=Integer.parseInt（第行子字符串（88,92））；
}否则{
airTemperature=整数.parseInt（行.子字符串（87,92））；
}
字符串质量=行。子字符串（92，93）；
if（气温！=缺失和质量匹配（“[01459]”）{
系统输出打印项次（“年”+年+气温）；
write（新文本（年份），新文本对（String.valueOf（气温），1））；
}

//文本对-下面的自定义数据类型代码

public class TextPair implements WritableComparable<TextPair> {

private Text first;
private Text second;

//Default constructor is a must
public TextPair() {
    this.first=new Text();
    this.second=new Text();
}

public TextPair(String first,int second) {
    try {
        this.first=new Text(first);
        this.second=new Text(String.valueOf(second));
    }catch(Exception ex) {
        System.out.println("Exception occurred "+ex.getCause());
    }

}

// Other methods such as compare, equals, hashcode, write, readFields etc implementation also needs to done

public Text getFirst() {
    return first;
}

public Text getSecond() {
    return second;
}

@Override
public String toString() {
    return this.first+"\t"+this.second+"\t";
}

}

public类TextPair实现了writeablecompare{
私人文本优先；
私人文本第二；
//默认构造函数是必须的
公共文本对（）{
this.first=新文本（）；
this.second=新文本（）；
}
公共文本对（字符串第一，整数第二）{
试一试{
this.first=新文本（first）；
this.second=新文本（String.valueOf（second））；
}捕获（例外情况除外）{
System.out.println（“发生异常”+ex.getCause（））；
}
}
//其他方法，如compare、equals、hashcode、write、readFields等，也需要实现
公共文本getFirst（）{
先返回；
}
公共文本getSecond（）{
返回第二；
}
@凌驾
公共字符串toString（）{
返回此.first+“\t”+此.second+“\t”；
}
}

如果您还需要更多详细信息，请参阅Hadoop权威指南。希望这对您有所帮助。

Hi@Pushkin，我遵循了您指定的逻辑。但我遇到了相同类型的错误。我参考了权威指南。但它对我不起作用。我遇到了这样的错误。错误：java.io.IOException:映射值中的类型不匹配：预期为org.apache.hadoop.io.FloatWritable，收到的文本Pairi在作业代码中（您在其中配置映射器和减速机）看到过这种情况，所提到的输出类型与映射器实际输出的类型不同。如果您可以共享Job、mapper和Reducer的完整代码，这将非常有用。这非常有效。在我的例子中，我们不需要使用TextPair。这可以通过文本本身来完成。非常感谢您的帮助。是的，您可以在fa中使用文本作为值ct您可以使用Hadoop框架支持的任何其他数据类型。您有什么问题吗？如果有，请共享您的代码和异常的堆栈跟踪。感谢Azim的关注。我通过更改job object中的数据类型实现了这一点。如果您能读到Oreilly的Hadoop:the Financial Guide book，那就更好了。太棒了。非常好问题已解决：）