Apache pig 在清管器中限制但不是顺序
我在PIG中使用Apache pig 在清管器中限制但不是顺序,apache-pig,Apache Pig,我在PIG中使用Limit时遇到一个问题 Limit的结果已排序,但我不希望对结果进行排序 根据网站上的示例: A = LOAD 'data' AS (a1:int,a2:int,a3:int); DUMP A; (1,2,3) (4,2,1) (8,3,4) (4,3,3) (7,2,5) (8,4,3) 使用限制 X = LIMIT A 3; DUMP X; (1,2,3) (4,3,3) (7,2,5) 是否有可能在reuslt中显示前3行而不进行排序 (1,2,3) (4,2,1
Limit
时遇到一个问题
Limit
的结果已排序,但我不希望对结果进行排序
根据网站上的示例:
A = LOAD 'data' AS (a1:int,a2:int,a3:int);
DUMP A;
(1,2,3)
(4,2,1)
(8,3,4)
(4,3,3)
(7,2,5)
(8,4,3)
使用限制
X = LIMIT A 3;
DUMP X;
(1,2,3)
(4,3,3)
(7,2,5)
是否有可能在reuslt中显示前3行而不进行排序
(1,2,3)
(4,2,1)
(8,3,4)
我的代码如下:
A = LOAD '$input';
B = foreach A generate $s_field;
C = FILTER B BY $pattern;
D = FOREACH C {
topnresult = LIMIT B $lines;
GENERATE FLATTEN(topnresult);
}
dump D;
非常感谢。默认情况下,LIMIT将在内部执行ORDER命令,然后执行LIMIT命令,因此显然您将获得排序列表。有很多方法可以解决这个问题,其中一个选择是 input.txt
1 2 3
4 2 1
8 3 4
4 3 3
7 2 5
8 4 3
PigScript:
A = LOAD 'input.txt' AS (a1:int,a2:int,a3:int);
B = RANK A;
C = FILTER B BY rank_A<=3;
D = FOREACH C GENERATE a1,a2,a3;
DUMP D;
(1,2,3)
(4,2,1)
(8,3,4)
A = LOAD 'input.txt' AS (a1:int,a2:int,a3:int);
B = GROUP A ALL;
C = FOREACH B {
top3list = LIMIT A 3;
GENERATE FLATTEN(top3list);
}
DUMP C;
(1,2,3)
(4,2,1)
(8,3,4)
A = LOAD '$input';
B = foreach A generate $s_field;
C = FILTER B BY $pattern;
D = RANK C;
E = FILTER D BY rank_C<=$lines;
F = FOREACH E GENERATE $1..;
DUMP F;
选项2:
A = LOAD 'input.txt' AS (a1:int,a2:int,a3:int);
B = RANK A;
C = FILTER B BY rank_A<=3;
D = FOREACH C GENERATE a1,a2,a3;
DUMP D;
(1,2,3)
(4,2,1)
(8,3,4)
A = LOAD 'input.txt' AS (a1:int,a2:int,a3:int);
B = GROUP A ALL;
C = FOREACH B {
top3list = LIMIT A 3;
GENERATE FLATTEN(top3list);
}
DUMP C;
(1,2,3)
(4,2,1)
(8,3,4)
A = LOAD '$input';
B = foreach A generate $s_field;
C = FILTER B BY $pattern;
D = RANK C;
E = FILTER D BY rank_C<=$lines;
F = FOREACH E GENERATE $1..;
DUMP F;
输出:
A = LOAD 'input.txt' AS (a1:int,a2:int,a3:int);
B = RANK A;
C = FILTER B BY rank_A<=3;
D = FOREACH C GENERATE a1,a2,a3;
DUMP D;
(1,2,3)
(4,2,1)
(8,3,4)
A = LOAD 'input.txt' AS (a1:int,a2:int,a3:int);
B = GROUP A ALL;
C = FOREACH B {
top3list = LIMIT A 3;
GENERATE FLATTEN(top3list);
}
DUMP C;
(1,2,3)
(4,2,1)
(8,3,4)
A = LOAD '$input';
B = foreach A generate $s_field;
C = FILTER B BY $pattern;
D = RANK C;
E = FILTER D BY rank_C<=$lines;
F = FOREACH E GENERATE $1..;
DUMP F;
更新:解决方案1
A = LOAD '$input';
B = foreach A generate $s_field;
C = FILTER B BY $pattern;
D = GROUP C ALL;
E = FOREACH D {
topnresult = LIMIT C $lines;
GENERATE FLATTEN(topnresult);
}
DUMP E;
解决方案2:
A = LOAD 'input.txt' AS (a1:int,a2:int,a3:int);
B = RANK A;
C = FILTER B BY rank_A<=3;
D = FOREACH C GENERATE a1,a2,a3;
DUMP D;
(1,2,3)
(4,2,1)
(8,3,4)
A = LOAD 'input.txt' AS (a1:int,a2:int,a3:int);
B = GROUP A ALL;
C = FOREACH B {
top3list = LIMIT A 3;
GENERATE FLATTEN(top3list);
}
DUMP C;
(1,2,3)
(4,2,1)
(8,3,4)
A = LOAD '$input';
B = foreach A generate $s_field;
C = FILTER B BY $pattern;
D = RANK C;
E = FILTER D BY rank_C<=$lines;
F = FOREACH E GENERATE $1..;
DUMP F;
A=加载“$input”;
B=每个生成$s_字段;
C=按$pattern筛选B;
D=等级C;
E=按秩_Cpig-x local-param input='input.txt'-param s_field='$0,$1,$2'-param pattern='$0使用pig和hadoop的版本是什么?你能粘贴你的示例数据和pig脚本吗?你能粘贴你的pig脚本吗?原因是你在混合这两种解决方案,请使用任何一种解决方案。过滤器stmt后不能使用嵌套FOREACH,bcoz嵌套FOREACH将仅适用于行李。请更新示例输入数据格式和$s_字段$pattern的值。我会帮你得到结果。有可能什么都不修改就得到结果吗?嗨,很抱歉最近回复了。结果似乎是按照另一个标准排序的,但我不理解排序方法。在我的程序中有两个属性,一个是仅显示前n行/结果,另一个是全部显示。对于“全部显示”属性,我使用LIMIT C 9999999
显示所有结果。这是正确的方法吗?我们可以在PIG中使用通配符,例如限制C*
?从结果来看,假设两个属性(top n和show all)的结果在开始时相同,不是吗?LIMIT的排序标准是什么?