Apache pig Can'；t调用接受元组输入的JavaUDF_Apache Pig

Apache pig Can'；t调用接受元组输入的JavaUDF

apache-pig

Apache pig Can'；t调用接受元组输入的JavaUDF,apache-pig,Apache Pig,我不理解调用接受元组作为输入的JavaUDF的方法 gsmCell = LOAD '$gsmCell' using PigStorage('\t') as (branchId, cellId: int, lac: int, lon: double, lat: double ); gsmCellFiltered = FILTER gsmCell BY cellI

我不理解调用接受元组作为输入的JavaUDF的方法

gsmCell = LOAD '$gsmCell' using PigStorage('\t') as
          (branchId,
           cellId: int,
           lac: int,
           lon: double,
           lat: double
          );

gsmCellFiltered = FILTER gsmCell BY     cellId     is not null and
                                        lac        is not null and
                                        lon        is not null and
                                        lat        is not null;

gsmCellFixed = FOREACH gsmCellFiltered GENERATE FLATTEN (pig.parser.GSMCellParser(* ) )  as
                                                (cellId: int,
                                                 lac: int,
                                                 lon: double,
                                                 lat: double,
                                                );

当我使用（）包装GSMCellParser的输入时，我进入了UDF：元组（Tuple）。 Pig将所有字段包装到元组中，并将其放入另一个元组中

当我试图传递字段列表时，请使用*或$0。。我确实有例外：

sed by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1045: 
<line 28, column 57> Could not infer the matching function for pig.parser.GSMCellParser as multiple or none of them fit. Please use an explicit cast.
    at org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.visit(TypeCheckingExpVisitor.java:761)
    at org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:88)
    at org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70)
    at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
    at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visitExpressionPlan(TypeCheckingRelVisitor.java:191)
    at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:157)
    at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:246)

sed by:org.apache.pig.impl.logicalLayer.validators.TypeCheckerException:错误1045:
无法推断pig.parser.GSMCellParser的匹配函数为多个或不匹配。请使用显式强制转换。
位于org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.visit（TypeCheckingExpVisitor.java:761）
位于org.apache.pig.newplan.logical.expression.UserFuncExpression.accept（UserFuncExpression.java:88）
位于org.apache.pig.newplan.ReverseDependencyOrderWalker.walk（ReverseDependencyOrderWalker.java:70）
访问org.apache.pig.newplan.PlanVisitor.visit（PlanVisitor.java:52）
位于org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.VisitePressionPlan（TypeCheckingRelVisitor.java:191）
位于org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit（TypeCheckingRelVisitor.java:157）
位于org.apache.pig.newplan.logical.relational.loggenerate.accept（loggenerate.java:246）

我做错了什么？我的目标是为我的UDF提供tuple。元组应包含字段列表。（即元组的大小应为4:cellid、lac、lon.lat）

UPD: 我已经尝试了所有组：

--filter non valid records
gsmCellFiltered = FILTER gsmCell BY     cellId     is not null and
                                        lac        is not null and
                                        lon        is not null and
                                        lat        is not null and
                                        azimuth    is not null and
                                        angWidth   is not null;

gsmCellFilteredGrouped = GROUP gsmCellFiltered ALL;

--fix records
gsmCellFixed = FOREACH gsmCellFilteredGrouped GENERATE FLATTEN                  (pig.parser.GSMCellParser($1))  as
                                                        (cellId: int,
                                                         lac: int,
                                                         lon: double,
                                                         lat: double,
                                                         azimuth: double,
                                                         ppw,
                                                         midDist: double,
                                                         maxDist,
                                                         cellType: chararray,
                                                         angWidth: double,
                                                         gen: chararray,
                                                         startAngle: double
                                                        );



Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1045: 
<line 27, column 64> Could not infer the matching function for pig.parser.GSMCellParser as multiple or none of them fit. Please use an explicit cast.

--筛选无效记录
gsmCellFiltered=按cellId筛选的gsmCell不为null，并且
lac不是空的，并且
lon不是空的，并且
lat不为空且为空
方位角不为空且
宽度不为空；
gsmCellFilteredGrouped=组gsmcellfilteredall；
--修复记录
gsmCellFixed=FOREACH gsmCellFilteredGrouped生成扁平化（pig.parser.GSMCellParser（$1））作为
（cellId:int，
lac:int，
朗：双倍，
拉特：双倍，
方位角：双，
ppw，
中间人：双倍，
马克斯特，
细胞类型：chararray，
宽度：双，
gen:chararray，
startAngle:双人
);
原因：org.apache.pig.impl.logicalLayer.validators.TypeCheckerException:错误1045:
无法推断pig.parser.GSMCellParser的匹配函数为多个或不匹配。请使用显式强制转换。

此UDF的输入模式为：Tuple 我不明白。 Tuple是一组有序的文件。LOAD函数向我返回一个元组。

我想把整个元组传递给我的UDF

从

T EvalFunc.eval（Tuple）

方法的签名中，您可以看到所有EvalFunc UDF都被传递了一个Tuple-这个Tuple包含传递给UDF的所有参数

在您的例子中，调用

GSMCellParser（*）

意味着元组的第一个参数将是正在处理的当前元组（因此元组中的元组）

从概念上讲，如果希望元组只包含作为

GSMCellParser（cellid，lac，lat，lon）

调用的字段，那么传递给eval func的元组将具有

（int，int，double，double）

的模式。这也使得元组编码更容易，因为您不必从传递的“元组中的元组”中找出字段，而是知道字段0是cellid，字段1id是lac，等等。

从

t EvalFunc.eval（Tuple）

方法的签名中，您可以看到，所有EvalFunc UDF都传递了一个元组-这个元组包含传递给UDF的所有参数

在您的例子中，调用

GSMCellParser（*）

意味着元组的第一个参数将是正在处理的当前元组（因此元组中的元组）

从概念上讲，如果希望元组只包含作为

GSMCellParser（cellid，lac，lat，lon）

调用的字段，那么传递给eval func的元组将具有

（int，int，double，double）

的模式。这也使得元组编码更容易，因为您不必从传递的“元组中的元组”中找出字段，而是知道字段0是cellid，字段1是lac，等等。

是的，我已经找到了。我搬到jython了。Java接口过于繁重和冗长，更多的是编码，而不是利润。另一个优点是jython允许命名参数。Java只提供有编号的访问。有编号的args是犯愚蠢错误的绝佳机会。谢谢是的，我已经拿到了。我搬到jython了。Java接口过于繁重和冗长，更多的是编码，而不是利润。另一个优点是jython允许命名参数。Java只提供有编号的访问。有编号的args是犯愚蠢错误的绝佳机会。谢谢