Apache pig 将具有多个包的元组拆分为多个元组_Apache Pig

Apache pig 将具有多个包的元组拆分为多个元组

apache-pig

Apache pig 将具有多个包的元组拆分为多个元组,apache-pig,Apache Pig,我的数据如下： {（2000）、（1800）、（2700）} {（2014），（1500），（1900）等。我创建了一个java UDF： DataBag bag = (DataBag)input.get(0); Tuple categoryCode = null; Tuple auxiliary = TupleFactory.getInstance().newTuple(3); int i = 0; for(Iterator<Tuple> c

我的数据如下：

{（2000）、（1800）、（2700）} {（2014），（1500），（1900）等。我创建了一个java UDF：

DataBag bag = (DataBag)input.get(0);

    Tuple categoryCode = null; 
    Tuple auxiliary = TupleFactory.getInstance().newTuple(3);

    int i = 0;
    for(Iterator<Tuple> code=bag.iterator(); code.hasNext();) {
        categoryCode=code.next();
        auxiliary.set(i, categoryCode.get(0).toString());
        i+=1;
    }

    return auxiliary.toDelimitedString(",");

DataBag=（DataBag）input.get（0）；
元组categoryCode=null；
Tuple auxiliary=TupleFactory.getInstance（）.newTuple（3）；
int i=0；
for（迭代器代码=bag.Iterator（）；code.hasNext（）；）{
categoryCode=code.next（）；
set（i，categoryCode.get（0.toString（））；
i+=1；
}
返回辅助.ToDelimiteString（“，”）；

我希望我的输出在不同的列中如下所示：

2000 1800 2700 2014 1500 1900等我的UDF将输出为：

200018002700 单柱中的201415001900等

请说明是否有其他解决方案。请帮助您输入。

您可以按原样返回元组并在pig脚本中进行展平。

您可以发布完整的UDF代码吗？公共类BagToAtom扩展了EvalFunc{public String exec（元组输入）抛出IOException{DataBag=（DataBag）输入。get（0）；Tuple categoryCode=null；Tuple auxiliary=TupleFactory.getInstance（）.newTuple（3）；int i=0；for（迭代器代码=bag.Iterator（）；code.hasNext（）；）{categoryCode=code.next（）；auxiliary.set（i，categoryCode.get（0.toString（））；i+=1；}返回辅助.ToDelimiteString（“，”；}}