用于读取ORC文件的Mapreduce示例_Mapreduce_Hive_Orc

用于读取ORC文件的Mapreduce示例

mapreduce hive

用于读取ORC文件的Mapreduce示例,mapreduce,hive,orc,Mapreduce,Hive,Orc,我创建了一个mapreduce代码来分析一些文件。但有些文件是从蜂巢创建的，它们是ORC类型的可以像分析Texfile一样分析这个ORC文件吗？mapreduce的输出是否为文本类型您需要的是InputFormat或NewInputFormat或InputFormat job.setInputFormatClass(OrcNewInputFormat.class); 然后需要一个Typestruct，它描述表的模式（很可能在映射器中）现在您可以像这样访问文件的字段 List<Typ

我创建了一个mapreduce代码来分析一些文件。但有些文件是从蜂巢创建的，它们是ORC类型的

可以像分析Texfile一样分析这个ORC文件吗？mapreduce的输出是否为文本类型

您需要的是InputFormat或NewInputFormat或InputFormat

job.setInputFormatClass(OrcNewInputFormat.class);

然后需要一个Typestruct，它描述表的模式（很可能在映射器中）

现在您可以像这样访问文件的字段

List<TypeInfo> type_lst = ti.getAllStructFieldTypeInfos();
List<String> field_lst = ti.getAllStructFieldNames();

        rowId = value_lst.get(0).toString();

您好，我试过了，但是SerDe.initialize（context.getConfiguration（），properties）不起作用；不被承认。错误消息：“无法从类型反序列化器对非静态方法初始化（配置、属性）进行静态引用”对我也不起作用，rg.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow无法转换为org.apache.hadoop.hive.ql.io.orc.OrcStruct@你能帮忙吗？

    StructObjectInspector soi;
    Properties properties = new Properties();

    properties.setProperty("columns.typtes", ts);
    serde.initialize(context.getConfiguration(), properties);

    try {
        obj = serde.deserialize(value);
        soi = (StructObjectInspector)serde.getObjectInspector();
        value_lst = soi.getStructFieldsDataAsList(obj);
    } catch (SerDeException e) {
        e.printStackTrace();
    }

        rowId = value_lst.get(0).toString();