Apache pig 清管器分节连接_Apache Pig

Apache pig 清管器分节连接

apache-pig

Apache pig 清管器分节连接,apache-pig,Apache Pig,我需要将字段值从一行传播到另一个给定类型的记录例如，我的原始输入是 1,firefox,p 1,,q 1,,r 1,,s 2,ie,p 2,,s 3,chrome,p 3,,r 3,,s 4,netscape,p 期望的结果 1,firefox,p 1,firefox,q 1,firefox,r 1,firefox,s 2,ie,p 2,ie,s 3,chrome,p 3,chrome,r 3,chrome,s 4,netscape,p 我试过了 A = LOAD 'file1.t

我需要将字段值从一行传播到另一个给定类型的记录例如，我的原始输入是

1,firefox,p  
1,,q
1,,r
1,,s
2,ie,p
2,,s
3,chrome,p
3,,r
3,,s
4,netscape,p

期望的结果

1,firefox,p  
1,firefox,q
1,firefox,r
1,firefox,s
2,ie,p
2,ie,s
3,chrome,p
3,chrome,r
3,chrome,s
4,netscape,p

我试过了

A = LOAD 'file1.txt' using PigStorage(',') AS (id:int,browser:chararray,type:chararray);
SPLIT A INTO B IF (type =='p'), C IF (type!='p' );
joined =  JOIN B BY id FULL, C BY id;
joinedFields = FOREACH joined GENERATE  B::id,  B::type, B::browser, C::id, C::type;
dump joinedFields;

我得到的结果是

(,,,1,p  )
(,,,1,q)
(,,,1,r)
(,,,1,s)
(2,p,ie,2,s)
(3,p,chrome,3,r)
(3,p,chrome,3,s)
(4,p,netscape,,)

非常感谢您的帮助。

PIG不完全是SQL，它是基于数据流、MapReduce和组构建的（也有连接）。您可以使用GROUP BY、嵌套在FOREACH中的过滤器和展平来获得结果

inpt = LOAD 'file1.txt' using PigStorage(',') AS (id:int,browser:chararray,type:chararray);
grp = GROUP inpt BY id;
Result = FOREACH grp {
    P = FILTER inpt BY type == 'p'; -- leave the record that contain p for the id
    PL = LIMIT P 1; -- make sure there is just one
    GENERATE FLATTEN(inpt.(id,type)), FLATTEN(PL.browser); -- convert bags produced by group by back to rows
};

太好了，非常感谢你，你是对的，我将不得不走出SQL思维模式。把它看作一个脚本，你可以一步一步地定义多个操作，逐渐增加和更改数据，在许多情况下，它在RAM中。例如，请参见均值、方差和标准差的计算：这是一个很大的帮助，alexeipab，非常感谢您的时间。你这样做很有道理。