Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
(hadoop.pig)单个表中的多个计数_Hadoop_Apache Pig - Fatal编程技术网

(hadoop.pig)单个表中的多个计数

(hadoop.pig)单个表中的多个计数,hadoop,apache-pig,Hadoop,Apache Pig,所以,我有一个数据,它有两个值,字符串和一个数字 data(string:chararray, number:int) 我用5种不同的规则计算 1:int为0~1 2:int为1~2 ~ 5:int为4~5 所以我可以一个人数 zero_to_one = filter avg_user by average_stars >= 0 and average_stars <= 1; A = GROUP zero_to_one ALL; zto_count = FOREACH A GENE

所以,我有一个数据,它有两个值,字符串和一个数字

data(string:chararray, number:int)
我用5种不同的规则计算

1:int为0~1

2:int为1~2

~

5:int为4~5

所以我可以一个人数

zero_to_one = filter avg_user by average_stars >= 0 and average_stars <= 1;
A = GROUP zero_to_one ALL;
zto_count = FOREACH A GENERATE COUNT(zero_to_one);

one_to_two = filter avg_user by average_stars > 1 and average_stars <= 2;
B = GROUP one_to_two ALL;
ott_count = FOREACH B GENERATE COUNT(one_to_two);

two_to_three = filter avg_user by average_stars > 2 and average_stars <= 3;
C = GROUP two_to_three ALL;
ttt_count = FOREACH C GENERATE COUNT( two_to_three);

three_to_four = filter avg_user by average_stars > 3 and average_stars <= 4;
D = GROUP three_to_four ALL;
ttf_count = FOREACH D GENERATE COUNT( three_to_four);

four_to_five = filter avg_user by average_stars > 4 and average_stars <= 5;
E = GROUP four_to_five ALL;
ftf_count = FOREACH E GENERATE COUNT( four_to_five);
那么这个表将是{1,3,2,3,5}

解析数据并以这种方式组织它们很容易

有什么办法吗

将其用作输入:

foo 2
foo 3
foo 2
foo 3
foo 5
foo 4
foo 0
foo 4
foo 4
foo 5
foo 1
foo 5
(0和1各出现一次,2和3各出现两次,4和5各出现三次)

此脚本:

A = LOAD 'myData' USING PigStorage(' ') AS (name: chararray, number: int);

B = FOREACH (GROUP A BY number) GENERATE group AS number, COUNT(A) AS count ;

C = FOREACH (GROUP B ALL) {
    zto = FOREACH B GENERATE (number==0?count:0) + (number==1?count:0) ;
    ott = FOREACH B GENERATE (number==1?count:0) + (number==2?count:0) ;
    ttt = FOREACH B GENERATE (number==2?count:0) + (number==3?count:0) ;
    ttf = FOREACH B GENERATE (number==3?count:0) + (number==4?count:0) ;
    ftf = FOREACH B GENERATE (number==4?count:0) + (number==5?count:0) ;
    GENERATE SUM(zto) AS zto,
             SUM(ott) AS ott,
             SUM(ttt) AS ttt,
             SUM(ttf) AS ttf,
             SUM(ftf) AS ftf ;
}
生成此输出:

C: {zto: long,ott: long,ttt: long,ttf: long,ftf: long}
(2,3,4,5,6)
C中foreach的数量其实并不重要,因为C最多只有5个元素,但如果是这样的话,它们可以像这样组合在一起:

C = FOREACH (GROUP B ALL) {
    total = FOREACH B GENERATE (number==0?count:0) + (number==1?count:0) AS zto,
                               (number==1?count:0) + (number==2?count:0) AS ott,
                               (number==2?count:0) + (number==3?count:0) AS ttt,
                               (number==3?count:0) + (number==4?count:0) AS ttf,
                               (number==4?count:0) + (number==5?count:0) AS ftf ;
    GENERATE SUM(total.zto) AS zto,
             SUM(total.ott) AS ott,
             SUM(total.ttt) AS ttt,
             SUM(total.ttf) AS ttf,
             SUM(total.ftf) AS ftf ;
}

非常感谢你,2ert先生。我想我现在把一切都弄明白了。我感谢你的帮助!有一件事你没有说明,这段代码只会在数据没有小数点的情况下覆盖数据,但是因为我没有说明数据可能有小数点(我说int,这是我的错误),所以你的答案是完美的。我能修好那部分。非常感谢你!
C = FOREACH (GROUP B ALL) {
    total = FOREACH B GENERATE (number==0?count:0) + (number==1?count:0) AS zto,
                               (number==1?count:0) + (number==2?count:0) AS ott,
                               (number==2?count:0) + (number==3?count:0) AS ttt,
                               (number==3?count:0) + (number==4?count:0) AS ttf,
                               (number==4?count:0) + (number==5?count:0) AS ftf ;
    GENERATE SUM(total.zto) AS zto,
             SUM(total.ott) AS ott,
             SUM(total.ttt) AS ttt,
             SUM(total.ttf) AS ttf,
             SUM(total.ftf) AS ftf ;
}