Java 在pig中迭代数组

Java 在pig中迭代数组,java,arrays,hadoop,apache-pig,user-defined-functions,Java,Arrays,Hadoop,Apache Pig,User Defined Functions,我有以下结构的记录: "event" : [ {"x":"1","y":"2"} , {"x":"5","y":"2"}] "event" : [ {"random":"r", "pol" : "t", "a" : "b"} , {"x":"4","y":5"}] "event" : [ {"random":"f", "pol" : "w", "a" : "r"} , {"x":"12","y":5"} , {"x":"6","y":"7"}] 我感兴趣的领域是x&y。对于每个记录,我需要

我有以下结构的记录:

"event" : [ {"x":"1","y":"2"} , {"x":"5","y":"2"}]
"event" : [ {"random":"r", "pol" : "t", "a" : "b"} , {"x":"4","y":5"}] 
"event" : [ {"random":"f", "pol" : "w", "a" : "r"} , {"x":"12","y":5"} , {"x":"6","y":"7"}] 
我感兴趣的领域是x&y。对于每个记录,我需要提取具有最高值x的映射

即,对于第一个事件,选择{x:5,y:2},对于第二个事件{x:4,y:5},对于第三个事件{x:12,y:5}


我知道我们可以使用一个UDF来迭代数组中的每个映射,并选择一个具有最大x值的映射,但是有没有一种方法可以在不编写UDF的情况下实现这一点

你可以这样做

REGISTER elephant-bird-core-4.3.jar;
REGISTER elephant-bird-hadoop-compat-4.5.jar;
REGISTER elephant-bird-pig-4.5.jar;

DEFINE JsonLoader com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad=true');

records = LOAD '$DATA_PATH' USING JsonLoader() AS (data: map[]);
events = FOREACH records GENERATE 
                                FLATTEN(data#'event') AS event;

grouped_events = COGROUP events by event#'x', event#'y';     

result = FOREACH grouped_events GENERATE
        MAX(events.event#'x'),
        MAX(events.event#'y');
-nestedLoad选项有助于加载json数组,我们可以将其扁平化为单独的事件,如上所述