Apache pig 将清管器中的一条线转换为多条线
我想为下面的查询编写一个pig脚本 输入为:Apache pig 将清管器中的一条线转换为多条线,apache-pig,Apache Pig,我想为下面的查询编写一个pig脚本 输入为: ABC,DEF,GHI,JKL,AAA,aaa,1,2,3,bbb,1,2,3,ccc,1,2,3,BBB,aaa,1,2,3,bbb,1,2,3,ccc,1,2,3 输出应为: ABC,DEF,GHI,JKL,AAA,aaa,1,2,3 ABC,DEF,GHI,JKL,AAA,bbb,1,2,3 ABC,DEF,GHI,JKL,AAA,ccc,1,2,3 ABC,DEF,GHI,JKL,BBB,aaa,1,2,3 ABC,DEF,GHI,JKL,
ABC,DEF,GHI,JKL,AAA,aaa,1,2,3,bbb,1,2,3,ccc,1,2,3,BBB,aaa,1,2,3,bbb,1,2,3,ccc,1,2,3
输出应为:
ABC,DEF,GHI,JKL,AAA,aaa,1,2,3
ABC,DEF,GHI,JKL,AAA,bbb,1,2,3
ABC,DEF,GHI,JKL,AAA,ccc,1,2,3
ABC,DEF,GHI,JKL,BBB,aaa,1,2,3
ABC,DEF,GHI,JKL,BBB,bbb,1,2,3
ABC,DEF,GHI,JKL,BBB,ccc,1,2,3
有人能帮我吗?您可以编写自己的自定义自定义自定义项,或者尝试以下方法 input.txt
ABC,DEF,GHI,JKL,AAA,aaa,1,2,3,bbb,1,2,3,ccc,1,2,3,BBB,aaa,1,2,3,bbb,1,2,3,ccc,1,2,3,CCC,aaa,1,2,3,bbb,1,2,3,ccc,1,2,3
PigScript:
A = LOAD 'input.txt' USING PigStorage(',');
B = FOREACH A GENERATE $0,$1,$2,$3,
FLATTEN(TOTUPLE($4)),
FLATTEN(TOBAG(
TOTUPLE($5..$8),
TOTUPLE($9..$12),
TOTUPLE($13..$16)
)
);
C = FOREACH A GENERATE $0,$1,$2,$3,
FLATTEN(TOTUPLE($17)),
FLATTEN(TOBAG(
TOTUPLE($18..$21),
TOTUPLE($22..$25),
TOTUPLE($26..$29)
)
);
D = UNION B,C;
DUMP D
(ABC,DEF,GHI,JKL,AAA,aaa,1,2,3)
(ABC,DEF,GHI,JKL,AAA,bbb,1,2,3)
(ABC,DEF,GHI,JKL,AAA,ccc,1,2,3)
(ABC,DEF,GHI,JKL,BBB,aaa,1,2,3)
(ABC,DEF,GHI,JKL,BBB,bbb,1,2,3)
(ABC,DEF,GHI,JKL,BBB,ccc,1,2,3)
输出:
A = LOAD 'input.txt' USING PigStorage(',');
B = FOREACH A GENERATE $0,$1,$2,$3,
FLATTEN(TOTUPLE($4)),
FLATTEN(TOBAG(
TOTUPLE($5..$8),
TOTUPLE($9..$12),
TOTUPLE($13..$16)
)
);
C = FOREACH A GENERATE $0,$1,$2,$3,
FLATTEN(TOTUPLE($17)),
FLATTEN(TOBAG(
TOTUPLE($18..$21),
TOTUPLE($22..$25),
TOTUPLE($26..$29)
)
);
D = UNION B,C;
DUMP D
(ABC,DEF,GHI,JKL,AAA,aaa,1,2,3)
(ABC,DEF,GHI,JKL,AAA,bbb,1,2,3)
(ABC,DEF,GHI,JKL,AAA,ccc,1,2,3)
(ABC,DEF,GHI,JKL,BBB,aaa,1,2,3)
(ABC,DEF,GHI,JKL,BBB,bbb,1,2,3)
(ABC,DEF,GHI,JKL,BBB,ccc,1,2,3)
你试过什么?