Apache pig Pig:如何从包中的多个元组访问字段

Apache pig Pig:如何从包中的多个元组访问字段,apache-pig,Apache Pig,我的猪剧本: 组stmt后的输出: 我需要这样的最终输出: 描述查询: 与其按$0.AA分组,我建议在C上进行如下自连接: A = LOAD 'average.txt' as line; B = FOREACH A GENERATE REGEX_EXTRACT_ALL(line,'^(.\*?)\\s+(.\*?)\\s+(.*?) AS TUPLE(AA:chararray,BB:chararray,CC:chararray); C = FILTER B BY tuple_0.

我的猪剧本:

组stmt后的输出:

我需要这样的最终输出:

描述查询:


与其按$0.AA分组,我建议在C上进行如下自连接:

A = LOAD 'average.txt' as line;  
B = FOREACH A GENERATE REGEX_EXTRACT_ALL(line,'^(.\*?)\\s+(.\*?)\\s+(.*?) AS     TUPLE(AA:chararray,BB:chararray,CC:chararray);  
C = FILTER B BY tuple_0.AA IS NOT NULL;  
C = FOREACH C GENERATE tuple_0.AA AS AA, tuple_0.BB AS BB, tuple_0.CC AS CC; --renaming columns to easy names

D = FOREACH C GENERATE AA, BB, CC;  -- clone of C

CD = JOIN C BY AA, D BY AA;
CD2 = FOREACH CD 
         GENERATE 
            C::AA AS AA, 
            C::BB AS CBB, 
            C::CC AS CCC, 
            D::BB AS DBB,
            D::CC AS DCC;

我希望这能有所帮助。

Gaurav,谢谢你的建议。我将应用您的逻辑并让您知道输出。@SivasakthiJayaraman,我希望我的回答对您有所帮助。如果你的问题有帮助的话,请在上面标出答案。
(1,{((1,a,b)),((1,c,d))})  
(2,{((2,e,f)),((2,g,h))})
(1,a,b,c,d)  
(2,e,f,g,h)
| D     | group:chararray     | C:bag{:tuple(tuple_0:tuple(AA:chararray,BB:chararray,CC:chararray))}  
A = LOAD 'average.txt' as line;  
B = FOREACH A GENERATE REGEX_EXTRACT_ALL(line,'^(.\*?)\\s+(.\*?)\\s+(.*?) AS     TUPLE(AA:chararray,BB:chararray,CC:chararray);  
C = FILTER B BY tuple_0.AA IS NOT NULL;  
C = FOREACH C GENERATE tuple_0.AA AS AA, tuple_0.BB AS BB, tuple_0.CC AS CC; --renaming columns to easy names

D = FOREACH C GENERATE AA, BB, CC;  -- clone of C

CD = JOIN C BY AA, D BY AA;
CD2 = FOREACH CD 
         GENERATE 
            C::AA AS AA, 
            C::BB AS CBB, 
            C::CC AS CCC, 
            D::BB AS DBB,
            D::CC AS DCC;