Apache pig pig联接中多列的最大值
我有2个数据文件要加载到pig中Apache pig pig联接中多列的最大值,apache-pig,Apache Pig,我有2个数据文件要加载到pig中 A = LOAD 'temp.csv' USING PigStorage(',') AS (user:chararray,day:chararray,joinKey:chararray); B = LOAD 'new.csv' USING PigStorage(',') AS (user:chararray,day:chararray,joinKey:chararray); c = join A by (joinKey),B by (joinKey); d =
A = LOAD 'temp.csv' USING PigStorage(',') AS (user:chararray,day:chararray,joinKey:chararray);
B = LOAD 'new.csv' USING PigStorage(',') AS (user:chararray,day:chararray,joinKey:chararray);
c = join A by (joinKey),B by (joinKey);
d = FOREACH c GENERATE MAX(A:day,B:day) as maxDay
这不起作用,因为没有分组依据。
如何实现两列的最大值。
获取最大值后,我需要存储与“最大日”字段相关的用户。从中获得了关于我需要使用案例的第二个问题的答案。您希望获得什么输出?如果您只对max date感兴趣,则可以合并这两个数据集,然后使用group by。