Apache pig 将清管器中的一条线转换为多条线

Apache pig 将清管器中的一条线转换为多条线,apache-pig,Apache Pig,我想为下面的查询编写一个pig脚本 输入为: ABC,DEF,GHI,JKL,AAA,aaa,1,2,3,bbb,1,2,3,ccc,1,2,3,BBB,aaa,1,2,3,bbb,1,2,3,ccc,1,2,3 输出应为: ABC,DEF,GHI,JKL,AAA,aaa,1,2,3 ABC,DEF,GHI,JKL,AAA,bbb,1,2,3 ABC,DEF,GHI,JKL,AAA,ccc,1,2,3 ABC,DEF,GHI,JKL,BBB,aaa,1,2,3 ABC,DEF,GHI,JKL,

我想为下面的查询编写一个pig脚本

输入为:

ABC,DEF,GHI,JKL,AAA,aaa,1,2,3,bbb,1,2,3,ccc,1,2,3,BBB,aaa,1,2,3,bbb,1,2,3,ccc,1,2,3
输出应为:

ABC,DEF,GHI,JKL,AAA,aaa,1,2,3
ABC,DEF,GHI,JKL,AAA,bbb,1,2,3
ABC,DEF,GHI,JKL,AAA,ccc,1,2,3
ABC,DEF,GHI,JKL,BBB,aaa,1,2,3
ABC,DEF,GHI,JKL,BBB,bbb,1,2,3
ABC,DEF,GHI,JKL,BBB,ccc,1,2,3

有人能帮我吗?

您可以编写自己的自定义自定义自定义项,或者尝试以下方法

input.txt

ABC,DEF,GHI,JKL,AAA,aaa,1,2,3,bbb,1,2,3,ccc,1,2,3,BBB,aaa,1,2,3,bbb,1,2,3,ccc,1,2,3,CCC,aaa,1,2,3,bbb,1,2,3,ccc,1,2,3
PigScript:

A = LOAD 'input.txt' USING PigStorage(',');
B = FOREACH A GENERATE $0,$1,$2,$3,
                       FLATTEN(TOTUPLE($4)),
                       FLATTEN(TOBAG(
                                     TOTUPLE($5..$8),
                                     TOTUPLE($9..$12),
                                     TOTUPLE($13..$16)
                                    )
                              );
C = FOREACH A GENERATE $0,$1,$2,$3,
                       FLATTEN(TOTUPLE($17)),
                       FLATTEN(TOBAG(
                                     TOTUPLE($18..$21),
                                     TOTUPLE($22..$25),
                                     TOTUPLE($26..$29)
                                    )
                              );
D = UNION B,C;
DUMP D
(ABC,DEF,GHI,JKL,AAA,aaa,1,2,3)
(ABC,DEF,GHI,JKL,AAA,bbb,1,2,3)
(ABC,DEF,GHI,JKL,AAA,ccc,1,2,3)
(ABC,DEF,GHI,JKL,BBB,aaa,1,2,3)
(ABC,DEF,GHI,JKL,BBB,bbb,1,2,3)
(ABC,DEF,GHI,JKL,BBB,ccc,1,2,3)
输出:

A = LOAD 'input.txt' USING PigStorage(',');
B = FOREACH A GENERATE $0,$1,$2,$3,
                       FLATTEN(TOTUPLE($4)),
                       FLATTEN(TOBAG(
                                     TOTUPLE($5..$8),
                                     TOTUPLE($9..$12),
                                     TOTUPLE($13..$16)
                                    )
                              );
C = FOREACH A GENERATE $0,$1,$2,$3,
                       FLATTEN(TOTUPLE($17)),
                       FLATTEN(TOBAG(
                                     TOTUPLE($18..$21),
                                     TOTUPLE($22..$25),
                                     TOTUPLE($26..$29)
                                    )
                              );
D = UNION B,C;
DUMP D
(ABC,DEF,GHI,JKL,AAA,aaa,1,2,3)
(ABC,DEF,GHI,JKL,AAA,bbb,1,2,3)
(ABC,DEF,GHI,JKL,AAA,ccc,1,2,3)
(ABC,DEF,GHI,JKL,BBB,aaa,1,2,3)
(ABC,DEF,GHI,JKL,BBB,bbb,1,2,3)
(ABC,DEF,GHI,JKL,BBB,ccc,1,2,3)

你试过什么?