Apache pig 如何按索引和索引进行分组+;1只猪

Apache pig 如何按索引和索引进行分组+;1只猪,apache-pig,Apache Pig,我有这样的数据输入: (index,x,y) (1,0.0,0.0) (2,-0.1,-0.1) (3,1.0,-2.2) ... 如何按[index]和[index+1]进行分组 {(1, 0.0, 0.0), (2, -0.1, -0.1)} {(2, -0.1, -0.1), (3, 1.0, -2.2)} ... 请帮我解决这个问题。谢谢。您可以使用以下查询(注释中的解释) 我用于测试的文件包含以下内容: 1,0.0,0.0 2,-0.1,-0.1 3,1.0,-2.2 请注意,括

我有这样的数据输入:

(index,x,y)
(1,0.0,0.0)
(2,-0.1,-0.1)
(3,1.0,-2.2)
...
如何按[index]和[index+1]进行分组

{(1, 0.0, 0.0), (2, -0.1, -0.1)}
{(2, -0.1, -0.1), (3, 1.0, -2.2)}
...

请帮我解决这个问题。谢谢。

您可以使用以下查询(注释中的解释)

我用于测试的文件包含以下内容:

1,0.0,0.0
2,-0.1,-0.1
3,1.0,-2.2
请注意,括号不存在,但可以使用简单的预处理脚本将其过滤掉

中间结果的转储为:

dumpr

(1,0.0,0.0)
(2,-0.1,-0.1)
(3,1.0,-2.2)
dumpr1

((1,1,0.0,0.0))
((2,2,-0.1,-0.1))
((3,3,1.0,-2.2))
dumpr2

((1,1,0.0,0.0))
((2,2,-0.1,-0.1))
((3,3,1.0,-2.2))
转储结果

(1,{(1,1,0.0,0.0)},{})
(2,{(2,2,-0.1,-0.1)},{(2,1,0.0,0.0)})
(3,{(3,3,1.0,-2.2)},{(3,2,-0.1,-0.1)})
(4,{},{(4,3,1.0,-2.2)})
转储结果2

(2,{(2,2,-0.1,-0.1)},{(2,1,0.0,0.0)})
(3,{(3,3,1.0,-2.2)},{(3,2,-0.1,-0.1)})
转储结果3

(2,2,-0.1,-0.1,2,1,0.0,0.0)
(3,3,1.0,-2.2,3,2,-0.1,-0.1)
转储结果4

((2,-0.1,-0.1),(1,0.0,0.0))
((3,1.0,-2.2),(2,-0.1,-0.1))

以下方法适用于您的案例。
输入:

1,0.0,0.0
2,-0.1,-0.1
3,1.0,-2.2
A = LOAD 'input' USING PigStorage(',') AS(index:int,x:double,y:double);
B = FILTER A BY index>=1;
C = FILTER A BY index>1;
D = FOREACH C GENERATE ($0-1) AS dindex,index,x,y;
E = JOIN B BY index, D BY dindex;
F = FOREACH E GENERATE TOBAG(TOTUPLE(B::index,B::x,B::y),TOTUPLE(D::index,D::x,D::y));
DUMP F;
({(1,0.0,0.0),(2,-0.1,-0.1)})
({(2,-0.1,-0.1),(3,1.0,-2.2)})
PigScript:

1,0.0,0.0
2,-0.1,-0.1
3,1.0,-2.2
A = LOAD 'input' USING PigStorage(',') AS(index:int,x:double,y:double);
B = FILTER A BY index>=1;
C = FILTER A BY index>1;
D = FOREACH C GENERATE ($0-1) AS dindex,index,x,y;
E = JOIN B BY index, D BY dindex;
F = FOREACH E GENERATE TOBAG(TOTUPLE(B::index,B::x,B::y),TOTUPLE(D::index,D::x,D::y));
DUMP F;
({(1,0.0,0.0),(2,-0.1,-0.1)})
({(2,-0.1,-0.1),(3,1.0,-2.2)})
输出:

1,0.0,0.0
2,-0.1,-0.1
3,1.0,-2.2
A = LOAD 'input' USING PigStorage(',') AS(index:int,x:double,y:double);
B = FILTER A BY index>=1;
C = FILTER A BY index>1;
D = FOREACH C GENERATE ($0-1) AS dindex,index,x,y;
E = JOIN B BY index, D BY dindex;
F = FOREACH E GENERATE TOBAG(TOTUPLE(B::index,B::x,B::y),TOTUPLE(D::index,D::x,D::y));
DUMP F;
({(1,0.0,0.0),(2,-0.1,-0.1)})
({(2,-0.1,-0.1),(3,1.0,-2.2)})