Hadoop 计算FOREACH中的和

Hadoop 计算FOREACH中的和,hadoop,mapreduce,apache-pig,Hadoop,Mapreduce,Apache Pig,假设我有以下几点 DATA = foreach INPUT { //.. generate group, count(name) as total; } 我将以一个键按名称分组的关系结束 ('mike', 'someprop', 10) ('mike', 'otherprop', 3) ('doug', 'xprop', 5) ... 我想得到每个名字前10名的总和: ALIAS = group DATA by name; RESULT = foreach ALIAS {

假设我有以下几点

DATA = foreach INPUT {
   //..
   generate group, count(name) as total;
}
我将以一个键按名称分组的关系结束

('mike', 'someprop', 10)
('mike', 'otherprop', 3)
('doug', 'xprop', 5)
...
我想得到每个名字前10名的总和:

ALIAS = group DATA by name;
RESULT = foreach ALIAS {
   SORTED = ORDER DATA by total desc;
   TOP10 = LIMIT SORTED 10;

   //doesn't work! can't have GROUP inside FOREACH
   AGG = group TOP10 ALL;
   TOPTOTAL = foreach AGG generate SUM(AGG.total);

   generate group, TOPTOTAL;
}

如何计算
foreach
关系的值(总和、计数等)?目前无法在foreach中应用
分组ALL

SUM
只是一个以包为参数的函数,您可以通过从
TOP10
投影来创建此包:

ALIAS = group DATA by name;
RESULT = foreach ALIAS {
   SORTED = ORDER DATA by total desc;
   TOP10 = LIMIT SORTED 10;
   generate group, SUM(TOP10.total);
}