Pig Join-如何使用多个字段连接两个表,其中键中的一个字段是可选的?

Pig Join-如何使用多个字段连接两个表,其中键中的一个字段是可选的?,join,apache-pig,Join,Apache Pig,问题:猪初学者-下面是我的两个输入表 Table 1: Contain 3 columns (VID, TID and USID) v1 TID101 US101 v2 TID102 v3 TID103 v4 TID104 US104 v5 US105 v6 US106 我想连接表1和表2,并得到如下输出: Expected Output: v1 TID101 US101 p1 v2 TID102 p2 v3 TID103 p3

问题:猪初学者-下面是我的两个输入表

Table 1: Contain 3 columns (VID, TID and USID)
v1 TID101 US101 
v2 TID102 
v3 TID103 
v4 TID104 US104 
v5        US105 
v6        US106 

我想连接表1和表2,并得到如下输出:

Expected Output:
v1 TID101 US101 p1
v2 TID102       p2
v3 TID103       p3
v4 TID104 US104 p4
v5        US105 p5
a= JOIN table1 BY (TID, USID), table2 BY (TID, USID);
b= FOREACH a GENERATE table1::vID, table1::TID, table1::USID, table2::PID;

我尝试了如下的内部连接:

Expected Output:
v1 TID101 US101 p1
v2 TID102       p2
v3 TID103       p3
v4 TID104 US104 p4
v5        US105 p5
a= JOIN table1 BY (TID, USID), table2 BY (TID, USID);
b= FOREACH a GENERATE table1::vID, table1::TID, table1::USID, table2::PID;
但我只得到以下输出:


我可以尝试左外连接,但我觉得当我通过多个键连接时,两个键都被认为是必须连接的,我不能有“或”条件。如果table1记录包含USID或TID,我只想从table2中获取PID。我不确定我错过了什么,并且有兴趣了解达到预期输出的最佳方法。请帮忙

在单个列上进行连接,合并结果并区分最终关系

PigScript

A = LOAD 'test1.txt' USING PigStorage('\t') as (a1:chararray,a2:chararray,a3:chararray);
B = LOAD 'test2.txt' USING PigStorage('\t') as (b1:chararray,b2:chararray,b3:chararray);

A2 = JOIN A BY (a2),B by (b2);
A3 = JOIN B BY (b3),A by (a3);

C = FOREACH A2 GENERATE A::a1,A::a2,A::a3,B::b1;
D = FOREACH A3 GENERATE A::a1,B::b2,B::b3,B::b1;

E = UNION C,D;
E1 = DISTINCT E;

DUMP E1;
输出

A = LOAD 'test1.txt' USING PigStorage('\t') as (a1:chararray,a2:chararray,a3:chararray);
B = LOAD 'test2.txt' USING PigStorage('\t') as (b1:chararray,b2:chararray,b3:chararray);

A2 = JOIN A BY (a2),B by (b2);
A3 = JOIN B BY (b3),A by (a3);

C = FOREACH A2 GENERATE A::a1,A::a2,A::a3,B::b1;
D = FOREACH A3 GENERATE A::a1,B::b2,B::b3,B::b1;

E = UNION C,D;
E1 = DISTINCT E;

DUMP E1;

谢谢。这很有帮助