Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/javascript/407.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop 优化聚合输出pig脚本_Hadoop_Apache Pig - Fatal编程技术网

Hadoop 优化聚合输出pig脚本

Hadoop 优化聚合输出pig脚本,hadoop,apache-pig,Hadoop,Apache Pig,我正在尝试生成聚合输出。 最好的方法是什么: A_GROUP = GROUP A BY ID PARALLEL; A_COUNT = FOREACH A_GROUP { A_TMP1 = FILTER A BY Col1 == 'Other'; A_TMP2 = FILTER A BY Col2 == 'Other'; cnt_fltrCol1 = COUNT(A_TMP1); cnt_fltrCol2 = COUNT(A_TMP2

我正在尝试生成聚合输出。 最好的方法是什么:

A_GROUP = GROUP A BY ID PARALLEL;
A_COUNT = FOREACH A_GROUP {
        A_TMP1 = FILTER A BY Col1 == 'Other';
        A_TMP2 = FILTER A BY Col2 == 'Other';
        cnt_fltrCol1 = COUNT(A_TMP1);
        cnt_fltrCol2 = COUNT(A_TMP2);
        GENERATE group,cnt_fltrCol1,cnt_fltrCol2;
} 
或:

目前,我有内存问题(我真正的脚本要大得多)
提前感谢您的回答

第一个是计数,第二个是求和?是的,但他们在做同样的事情。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。?
A_FOREACH = FOREACH A GENERATE *, 
        ((Col1 == 'Other') ? 1 : 0) as fltrCol1, 
        ((Col2 == 'Other') ? 1 : 0) as fltrCol2;

A_GRP = GROUP A_FOREACH BY ID;

A_COUNT = FOREACH A_GRP {
            cnt_fltrCol1 = SUM(fltrCol1);
            cnt_fltrCol2 = SUM(fltrCol2);            
            GENERATE    
            group,cnt_fltrCol1,cnt_fltrCol2;
    }