Apache pig 加入Pig并创建一个新列
我有两个数据集Apache pig 加入Pig并创建一个新列,apache-pig,Apache Pig,我有两个数据集 Definition of schema A - Name, city, state A= { Ram, Sunnyvale, CA Soju, Austin, TX Rathos, Bangalore, Karnataka Mike, Portland, OR } B = { Ram, Refund Soju, Refund } 我希望根据状态连接这两个表,并获得如下输出 Schema Definition -
Definition of schema A - Name, city, state
A= {
Ram, Sunnyvale, CA
Soju, Austin, TX
Rathos, Bangalore, Karnataka
Mike, Portland, OR
}
B = {
Ram, Refund
Soju, Refund
}
我希望根据状态连接这两个表,并获得如下输出
Schema Definition - Name,City,State,RefundIssued (Yes/No)
Ram,Sunnyvale,CA,yes
Soju,Austin,TX,yes
Rathos,Bangalore,Karnataka,no
Mike,Portland, OR,no
我不确定如何指定我需要额外的列,以及逻辑上需要哪些列
A = load 'data1.txt' using PigStorage(',') as (name: chararray,city: chararray,state: chararray);
B= load 'data2.txt' using PigStorage(',') as (name: chararray,type: chararray);
C = join A by name LEFT OUTER,B by name;
D = foreach C generate A::name as firstname,B::type as charge_type;
--how to add new column which goes on refund issued as yes /no
store D into '1outdata.txt';
请注意,由于bincond的工作方式,退款问题可以是“true”、“false”或null。如果希望将null(左连接未找到匹配项或字段值为null)转换为false,请使用:
E = foreach D generate name , city, state, (RefundIssued IS NULL ? 'False' : RefundIssued) as RefundIssued
E = foreach D generate name , city, state, (RefundIssued IS NULL ? 'False' : RefundIssued) as RefundIssued