Apache pig 清管器变平,无空

Apache pig 清管器变平,无空,apache-pig,Apache Pig,我有一个猪袋子 (1139-50052,Aquatic,Consumer,6,makarina,2,{(),(Unknown)}) (1139-50052,Aquatic,Consumer,6,jabong,2,{(),(),(),(Unknown)}) 我需要把它展平,没有空值 (1139-50052,Aquatic,Consumer,6,makarina,2,Unknown) (1139-50052,Aquatic,Consumer,6,jabong,2,Unknown) 请注意。一个选

我有一个猪袋子

(1139-50052,Aquatic,Consumer,6,makarina,2,{(),(Unknown)})
(1139-50052,Aquatic,Consumer,6,jabong,2,{(),(),(),(Unknown)})
我需要把它展平,没有空值

(1139-50052,Aquatic,Consumer,6,makarina,2,Unknown)
(1139-50052,Aquatic,Consumer,6,jabong,2,Unknown)

请注意。

一个选项是您可以在BagToString函数中传递包,这样空值将被丢弃,然后根据分隔符“\u1”分割包值

除了您的输入,它还适用于其他组合,示例如下

输入

笔迹:

输出:

FLATTEN(STRSPLIT(BagToString(BagName),'_+')) 
1139-50052      Aquatic Consumer        6       makarina        2       {(),(Unknown)}
1139-50052      Aquatic Consumer        6       jabong  2       {(),(),(),(Unknown)}
1139-50052      Aquatic Consumer        6       test1   2       {(unknown1),(),(),(Unknown2)}
1139-50052      Aquatic Consumer        6       test2   2       {(unknown1),(unknown2),(),(Unknown3)}
A = LOAD 'input' USING PigStorage() AS (f0,f1,f2,f3,f4,f5,B:{T:(f7)});
B = FOREACH A GENERATE f0,f1,f2,f3,f4,f5,FLATTEN(STRSPLIT(BagToString(B),'_+'));
DUMP B;
(1139-50052,Aquatic,Consumer,6,makarina,2,Unknown)
(1139-50052,Aquatic,Consumer,6,jabong,2,Unknown)
(1139-50052,Aquatic,Consumer,6,test1,2,unknown1,Unknown2)
(1139-50052,Aquatic,Consumer,6,test2,2,unknown1,unknown2,Unknown3)