Apache pig Pig:如何从包中的多个元组访问字段
我的猪剧本: 组stmt后的输出: 我需要这样的最终输出: 描述查询:Apache pig Pig:如何从包中的多个元组访问字段,apache-pig,Apache Pig,我的猪剧本: 组stmt后的输出: 我需要这样的最终输出: 描述查询: 与其按$0.AA分组,我建议在C上进行如下自连接: A = LOAD 'average.txt' as line; B = FOREACH A GENERATE REGEX_EXTRACT_ALL(line,'^(.\*?)\\s+(.\*?)\\s+(.*?) AS TUPLE(AA:chararray,BB:chararray,CC:chararray); C = FILTER B BY tuple_0.
与其按$0.AA分组,我建议在C上进行如下自连接:
A = LOAD 'average.txt' as line;
B = FOREACH A GENERATE REGEX_EXTRACT_ALL(line,'^(.\*?)\\s+(.\*?)\\s+(.*?) AS TUPLE(AA:chararray,BB:chararray,CC:chararray);
C = FILTER B BY tuple_0.AA IS NOT NULL;
C = FOREACH C GENERATE tuple_0.AA AS AA, tuple_0.BB AS BB, tuple_0.CC AS CC; --renaming columns to easy names
D = FOREACH C GENERATE AA, BB, CC; -- clone of C
CD = JOIN C BY AA, D BY AA;
CD2 = FOREACH CD
GENERATE
C::AA AS AA,
C::BB AS CBB,
C::CC AS CCC,
D::BB AS DBB,
D::CC AS DCC;
我希望这能有所帮助。Gaurav,谢谢你的建议。我将应用您的逻辑并让您知道输出。@SivasakthiJayaraman,我希望我的回答对您有所帮助。如果你的问题有帮助的话,请在上面标出答案。
(1,{((1,a,b)),((1,c,d))})
(2,{((2,e,f)),((2,g,h))})
(1,a,b,c,d)
(2,e,f,g,h)
| D | group:chararray | C:bag{:tuple(tuple_0:tuple(AA:chararray,BB:chararray,CC:chararray))}
A = LOAD 'average.txt' as line;
B = FOREACH A GENERATE REGEX_EXTRACT_ALL(line,'^(.\*?)\\s+(.\*?)\\s+(.*?) AS TUPLE(AA:chararray,BB:chararray,CC:chararray);
C = FILTER B BY tuple_0.AA IS NOT NULL;
C = FOREACH C GENERATE tuple_0.AA AS AA, tuple_0.BB AS BB, tuple_0.CC AS CC; --renaming columns to easy names
D = FOREACH C GENERATE AA, BB, CC; -- clone of C
CD = JOIN C BY AA, D BY AA;
CD2 = FOREACH CD
GENERATE
C::AA AS AA,
C::BB AS CBB,
C::CC AS CCC,
D::BB AS DBB,
D::CC AS DCC;