Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/EmptyTag/134.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache pig 如何找到pig中排名前两位的收视率?_Apache Pig - Fatal编程技术网

Apache pig 如何找到pig中排名前两位的收视率?

Apache pig 如何找到pig中排名前两位的收视率?,apache-pig,Apache Pig,我的数据如下所示: USA,10 UK,8 INDIA,8 PAKISTAN,5 U.A.E,3 GERMANY,3 SWEDEN,2 我如何获得前两个评级最高的国家?根据上述示例数据,我希望: UK,8 INDIA,8 你能试试这个吗 更新: 如果pig版本中没有RANK操作符,那么使用本机pig很难解决此问题。一个选项可以是下载pig-0.11.1.jar,并将其设置在类路径中,然后尝试以下方法 input.txt USA,10 UK,8 INDIA,8 P

我的数据如下所示:

USA,10  
UK,8  
INDIA,8  
PAKISTAN,5  
U.A.E,3  
GERMANY,3  
SWEDEN,2
我如何获得前两个评级最高的国家?根据上述示例数据,我希望:

UK,8  
INDIA,8 
你能试试这个吗

更新:
如果pig版本中没有
RANK
操作符,那么使用本机pig很难解决此问题。一个选项可以是下载
pig-0.11.1.jar
,并将其设置在类路径中,然后尝试以下方法

input.txt

USA,10
UK,8
INDIA,8
PAKISTAN,5
U.A.E,3
GERMANY,3
SWEDEN,2
PigScript:

DEFINE MyOver org.apache.pig.piggybank.evaluation.Over('myrank:int');
DEFINE MyStitch org.apache.pig.piggybank.evaluation.Stitch;

A = LOAD 'input.txt' USING PigStorage(',') AS (country:chararray,rating:int);
B = GROUP A ALL;
C = FOREACH B  {
                 mysort = ORDER A BY rating DESC;
                 GENERATE FLATTEN(MyStitch(mysort,MyOver(mysort,'dense_rank',0,1,1)));
                }
D = FILTER C BY stitched::myrank==2;
E = FOREACH D GENERATE stitched::country AS country,stitched::rating AS rating;
DUMP E;
(UK,8)
(INDIA,8)
(UK,8)
(INDIA,8)
输出:

DEFINE MyOver org.apache.pig.piggybank.evaluation.Over('myrank:int');
DEFINE MyStitch org.apache.pig.piggybank.evaluation.Stitch;

A = LOAD 'input.txt' USING PigStorage(',') AS (country:chararray,rating:int);
B = GROUP A ALL;
C = FOREACH B  {
                 mysort = ORDER A BY rating DESC;
                 GENERATE FLATTEN(MyStitch(mysort,MyOver(mysort,'dense_rank',0,1,1)));
                }
D = FILTER C BY stitched::myrank==2;
E = FOREACH D GENERATE stitched::country AS country,stitched::rating AS rating;
DUMP E;
(UK,8)
(INDIA,8)
(UK,8)
(INDIA,8)
清管器版本>11支持等级操作员

A = LOAD 'input.txt' USING PigStorage(',') AS (country:chararray,rating:int);
B = RANK A BY rating DESC;
C = FILTER B BY rank_A==2;
D = FOREACH C GENERATE country,rating;
DUMP D;
输出:

DEFINE MyOver org.apache.pig.piggybank.evaluation.Over('myrank:int');
DEFINE MyStitch org.apache.pig.piggybank.evaluation.Stitch;

A = LOAD 'input.txt' USING PigStorage(',') AS (country:chararray,rating:int);
B = GROUP A ALL;
C = FOREACH B  {
                 mysort = ORDER A BY rating DESC;
                 GENERATE FLATTEN(MyStitch(mysort,MyOver(mysort,'dense_rank',0,1,1)));
                }
D = FILTER C BY stitched::myrank==2;
E = FOREACH D GENERATE stitched::country AS country,stitched::rating AS rating;
DUMP E;
(UK,8)
(INDIA,8)
(UK,8)
(INDIA,8)

我的hadoop pig版本是0.8.0,我不能使用RANK。更新了另一种方法,我希望这对你有用。