Apache pig 在PIG中将数据从一个关系复制到另一个关系
我有两种关系: 关系A:Apache pig 在PIG中将数据从一个关系复制到另一个关系,apache-pig,Apache Pig,我有两种关系: 关系A: 101,Ankit-Reddy,08022017 102,Siddarth-Battacharya,08022017 103,Rajesh-Khanna,08022017 关系B: 102,Ronit-Roy,09022017 103,Ranveer-Singh,09022017 107,sadiya-some,09022017 108,Raj-sharma,09022017 因此,在ID 102和103中,B中的日期不同,它是一个现有记录,但107108是新记录,
101,Ankit-Reddy,08022017
102,Siddarth-Battacharya,08022017
103,Rajesh-Khanna,08022017
关系B:
102,Ronit-Roy,09022017
103,Ranveer-Singh,09022017
107,sadiya-some,09022017
108,Raj-sharma,09022017
因此,在ID 102和103中,B中的日期不同,它是一个现有记录,但107108是新记录,所以它将保持不变。
如何在A中将其更新为最新版本
我的最终表格应该如下所示:
101,Ankit-Reddy,08022017
102,Ronit-Roy,08022017
103,Ranveer-Singh,08022017
107,sadiya-some,09022017
108,Raj-sharma,09022017
任何用于此的pig脚本。- 只在一个会话中获取记录
- 只获取B中的记录,如B1
- 加入A和B,创建一个a1、b2、a3和C1的记录
- 合并A1、B1、C1
A = LOAD 'test1.txt' USING PigStorage(',') AS (a1:int,a2:chararray,a3:chararray); B = LOAD 'test2.txt' USING PigStorage(',') AS (b1:int,b2:chararray,b3:chararray); A_JOIN = JOIN A BY a1 LEFT OUTER,B BY b1; A1 = FILTER A_JOIN BY b1 is null; B_JOIN = JOIN A BY a1 RIGHT OUTER,B BY b1; B1 = FILTER B_JOIN BY a1 is null; C_JOIN = JOIN A BY a1,B by b1; C1 = FOREACH C_JOIN GENERATE a1,b2,a3; D = UNION A1,B1,C1;
101,Ankit-Reddy,08022017
107,sadiya-some,09022017
108,Raj-sharma,09022017
102,Ronit-Roy,08022017
103,Ranveer-Singh,08022017
B1
101,Ankit-Reddy,08022017
107,sadiya-some,09022017
108,Raj-sharma,09022017
102,Ronit-Roy,08022017
103,Ranveer-Singh,08022017
C1
101,Ankit-Reddy,08022017
107,sadiya-some,09022017
108,Raj-sharma,09022017
102,Ronit-Roy,08022017
103,Ranveer-Singh,08022017