Java Pig-outputSchema-为元组创建模式
我试图定义输出模式,该模式应该是包含另外两个元组的元组,即Java Pig-outputSchema-为元组创建模式,java,hadoop,apache-pig,cloudera,Java,Hadoop,Apache Pig,Cloudera,我试图定义输出模式,该模式应该是包含另外两个元组的元组,即stats:Tuple(c:Tuple(),d:Tuple) 下面的代码没有按预期工作。它以某种方式产生如下结构: stats:tuple(b:tuple(c:tuple(),d:tuple())) 下面是由descripe生成的输出 sourceData: {com.mortardata.pig.dataspliter_36: (stats: ((name: chararray,customerId: chararray,VIN: c
stats:Tuple(c:Tuple(),d:Tuple)
下面的代码没有按预期工作。它以某种方式产生如下结构:
stats:tuple(b:tuple(c:tuple(),d:tuple()))
下面是由descripe生成的输出
sourceData: {com.mortardata.pig.dataspliter_36: (stats: ((name: chararray,customerId: chararray,VIN: chararray,birth_date: chararray,fuel_mileage: chararray,fuel_consumption: chararray),(name: chararray,customerId: chararray,VIN: chararray,birth_date: chararray,fuel_mileage: chararray,fuel_consumption: chararray)))}
是否可以创建如下结构,这意味着我需要从前面的示例中删除元组b
grunt> describe sourceData;
sourceData: {t: (s: (name: chararray,customerId: chararray,VIN: chararray,birth_date: chararray,fuel_mileage: chararray,fuel_consumption: chararray),n: (name: chararray,customerId: chararray,VIN: chararray,birth_date: chararray,fuel_mileage: chararray,fuel_consumption: chararray))}
下面的代码无法按预期工作
public Schema outputSchema(Schema input) {
Schema sensTuple = new Schema();
sensTuple.add(new Schema.FieldSchema("name", DataType.CHARARRAY));
sensTuple.add(new Schema.FieldSchema("customerId", DataType.CHARARRAY));
sensTuple.add(new Schema.FieldSchema("VIN", DataType.CHARARRAY));
sensTuple.add(new Schema.FieldSchema("birth_date", DataType.CHARARRAY));
sensTuple.add(new Schema.FieldSchema("fuel_mileage", DataType.CHARARRAY));
sensTuple.add(new Schema.FieldSchema("fuel_consumption", DataType.CHARARRAY));
Schema nonSensTuple = new Schema();
nonSensTuple.add(new Schema.FieldSchema("name", DataType.CHARARRAY));
nonSensTuple.add(new Schema.FieldSchema("customerId", DataType.CHARARRAY));
nonSensTuple.add(new Schema.FieldSchema("VIN", DataType.CHARARRAY));
nonSensTuple.add(new Schema.FieldSchema("birth_date", DataType.CHARARRAY));
nonSensTuple.add(new Schema.FieldSchema("fuel_mileage", DataType.CHARARRAY));
nonSensTuple.add(new Schema.FieldSchema("fuel_consumption", DataType.CHARARRAY));
Schema parentTuple = new Schema();
parentTuple.add(new Schema.FieldSchema(null, sensTuple, DataType.TUPLE));
parentTuple.add(new Schema.FieldSchema(null, nonSensTuple, DataType.TUPLE));
Schema outputSchema = new Schema();
outputSchema.add(new Schema.FieldSchema("stats", parentTuple, DataType.TUPLE));
return new Schema(new Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), input),
outputSchema, DataType.TUPLE));
UDF的exec方法返回:
public Tuple exec(Tuple tuple) throws IOException {
Tuple parentTuple = mTupleFactory.newTuple();
parentTuple.append(tuple1);
parentTuple.append(tuple2);
多谢各位