Hadoop MR作业中处理字节数组的最佳方法_Hadoop_Mapreduce_Bytearray

Hadoop MR作业中处理字节数组的最佳方法

hadoop mapreduce

Hadoop MR作业中处理字节数组的最佳方法,hadoop,mapreduce,bytearray,Hadoop,Mapreduce,Bytearray,我需要在比较器中比较MR作业的字节数组，但找不到处理字节数组的好方法，序列化/反序列化的对象具有以下字段： public class GeneralKey { String name; String type; ...other String fields .. } @Override public void readFields(DataInput input) throw IOException { name = input.readUTF(); type = inp

我需要在比较器中比较MR作业的字节数组，但找不到处理字节数组的好方法，序列化/反序列化的对象具有以下字段：

public class GeneralKey {
  String name;
  String type;
  ...other String fields ..
}

@Override 
public void readFields(DataInput input) throw IOException {
  name = input.readUTF();
  type = input.readUTF();
  ...
}

@Override
public void write(DataOutput output) throws IOException {
  output.writeUTF(name);
  output.writeUTF(type);
  ...
}

序列化字节数组如下所示：名称：[0,0]2个字节，这2个字节表示名称的长度，因为它是0，所以名称为空类型：[0,3,96,97,98]5字节，前2个字节是类型的长度，表示类型的值是3字节长，因此需要读取以下3个字节：96,97,98，字符串中的“abc”

想知道是否有更好的方法来处理字节数组，它可以将前两个字节读取为整数，然后可以决定接下来要读取多少字节来将它们转换为字符串。我使用hadoop 1.0.3并在AWS中运行该作业，我尝试了hbase的Bytes类，但由于某些原因，它抛出了我的类not found错误 java.lang.ClassNotFoundException:org.apache.hadoop.hbase.util.Bytes

是否有其他库可以用来轻松处理字节数组？谢谢

我使用字节数组作为键和值，但使用了以下内置类型：