Apache pig 在清管器中限制但不是顺序

Apache pig 在清管器中限制但不是顺序,apache-pig,Apache Pig,我在PIG中使用Limit时遇到一个问题 Limit的结果已排序,但我不希望对结果进行排序 根据网站上的示例: A = LOAD 'data' AS (a1:int,a2:int,a3:int); DUMP A; (1,2,3) (4,2,1) (8,3,4) (4,3,3) (7,2,5) (8,4,3) 使用限制 X = LIMIT A 3; DUMP X; (1,2,3) (4,3,3) (7,2,5) 是否有可能在reuslt中显示前3行而不进行排序 (1,2,3) (4,2,1

我在PIG中使用
Limit
时遇到一个问题

Limit
的结果已排序,但我不希望对结果进行排序

根据网站上的示例:

A = LOAD 'data' AS (a1:int,a2:int,a3:int);

DUMP A;
(1,2,3)
(4,2,1)
(8,3,4)
(4,3,3)
(7,2,5)
(8,4,3)
使用
限制

X = LIMIT A 3;

DUMP X;
(1,2,3)
(4,3,3)
(7,2,5)
是否有可能在reuslt中显示前3行而不进行排序

(1,2,3)
(4,2,1)
(8,3,4)
我的代码如下:

A = LOAD '$input';
B = foreach A generate $s_field;
C = FILTER B BY $pattern;
D = FOREACH C {
            topnresult = LIMIT B $lines;
            GENERATE FLATTEN(topnresult);
        }
dump D;

非常感谢。

默认情况下,LIMIT将在内部执行ORDER命令,然后执行LIMIT命令,因此显然您将获得排序列表。有很多方法可以解决这个问题,其中一个选择是

input.txt

1       2       3
4       2       1
8       3       4
4       3       3
7       2       5
8       4       3
PigScript:

A = LOAD 'input.txt' AS (a1:int,a2:int,a3:int);
B = RANK A;
C = FILTER B BY rank_A<=3;
D = FOREACH C GENERATE a1,a2,a3;
DUMP D;
(1,2,3)
(4,2,1)
(8,3,4)
A = LOAD 'input.txt' AS (a1:int,a2:int,a3:int);
B = GROUP A ALL;
C = FOREACH B {
                top3list =  LIMIT A 3;
                GENERATE FLATTEN(top3list);
              }
DUMP C;
(1,2,3)
(4,2,1)
(8,3,4)
A = LOAD '$input';
B = foreach A generate $s_field;
C = FILTER B BY $pattern;
D = RANK C;
E = FILTER D BY rank_C<=$lines;
F = FOREACH E GENERATE $1..;
DUMP F;
选项2:

A = LOAD 'input.txt' AS (a1:int,a2:int,a3:int);
B = RANK A;
C = FILTER B BY rank_A<=3;
D = FOREACH C GENERATE a1,a2,a3;
DUMP D;
(1,2,3)
(4,2,1)
(8,3,4)
A = LOAD 'input.txt' AS (a1:int,a2:int,a3:int);
B = GROUP A ALL;
C = FOREACH B {
                top3list =  LIMIT A 3;
                GENERATE FLATTEN(top3list);
              }
DUMP C;
(1,2,3)
(4,2,1)
(8,3,4)
A = LOAD '$input';
B = foreach A generate $s_field;
C = FILTER B BY $pattern;
D = RANK C;
E = FILTER D BY rank_C<=$lines;
F = FOREACH E GENERATE $1..;
DUMP F;
输出:

A = LOAD 'input.txt' AS (a1:int,a2:int,a3:int);
B = RANK A;
C = FILTER B BY rank_A<=3;
D = FOREACH C GENERATE a1,a2,a3;
DUMP D;
(1,2,3)
(4,2,1)
(8,3,4)
A = LOAD 'input.txt' AS (a1:int,a2:int,a3:int);
B = GROUP A ALL;
C = FOREACH B {
                top3list =  LIMIT A 3;
                GENERATE FLATTEN(top3list);
              }
DUMP C;
(1,2,3)
(4,2,1)
(8,3,4)
A = LOAD '$input';
B = foreach A generate $s_field;
C = FILTER B BY $pattern;
D = RANK C;
E = FILTER D BY rank_C<=$lines;
F = FOREACH E GENERATE $1..;
DUMP F;
更新:解决方案1

A = LOAD '$input';
B = foreach A generate $s_field;
C = FILTER B BY $pattern;
D = GROUP C ALL;
E = FOREACH D {
            topnresult = LIMIT C $lines;
            GENERATE FLATTEN(topnresult);
        }
DUMP E;
解决方案2:

A = LOAD 'input.txt' AS (a1:int,a2:int,a3:int);
B = RANK A;
C = FILTER B BY rank_A<=3;
D = FOREACH C GENERATE a1,a2,a3;
DUMP D;
(1,2,3)
(4,2,1)
(8,3,4)
A = LOAD 'input.txt' AS (a1:int,a2:int,a3:int);
B = GROUP A ALL;
C = FOREACH B {
                top3list =  LIMIT A 3;
                GENERATE FLATTEN(top3list);
              }
DUMP C;
(1,2,3)
(4,2,1)
(8,3,4)
A = LOAD '$input';
B = foreach A generate $s_field;
C = FILTER B BY $pattern;
D = RANK C;
E = FILTER D BY rank_C<=$lines;
F = FOREACH E GENERATE $1..;
DUMP F;
A=加载“$input”;
B=每个生成$s_字段;
C=按$pattern筛选B;
D=等级C;

E=按秩_Cpig-x local-param input='input.txt'-param s_field='$0,$1,$2'-param pattern='$0使用pig和hadoop的版本是什么?你能粘贴你的示例数据和pig脚本吗?你能粘贴你的pig脚本吗?原因是你在混合这两种解决方案,请使用任何一种解决方案。过滤器stmt后不能使用嵌套FOREACH,bcoz嵌套FOREACH将仅适用于行李。请更新示例输入数据格式和$s_字段$pattern的值。我会帮你得到结果。有可能什么都不修改就得到结果吗?嗨,很抱歉最近回复了。结果似乎是按照另一个标准排序的,但我不理解排序方法。在我的程序中有两个属性,一个是仅显示前n行/结果,另一个是全部显示。对于“全部显示”属性,我使用
LIMIT C 9999999
显示所有结果。这是正确的方法吗?我们可以在PIG中使用通配符,例如
限制C*
?从结果来看,假设两个属性(top n和show all)的结果在开始时相同,不是吗?LIMIT的排序标准是什么?