Apache pig 连接来自不同关系的字段

Apache pig 连接来自不同关系的字段,apache-pig,Apache Pig,有两种关系 r1: {f1: chararray, f2: chararray} r2: {f3: chararray, f4: chararray} 这两个关系中都没有唯一的键,但元组的数量相同 是否有一种方法可以连接关系的相应字段以获得类似f2、f4的输出?如果元组顺序正确,可以使用秩 r1a = RANK r1 BY * DENSE; r2a = RANK r2 BY * DENSE; r1r2 = JOIN r1a BY $0, r2a BY $0; 另一种选择是使用:交叉() 注

有两种关系

r1: {f1: chararray, f2: chararray}
r2: {f3: chararray, f4: chararray}
这两个关系中都没有唯一的键,但元组的数量相同


是否有一种方法可以连接关系的相应字段以获得类似f2、f4的输出?

如果元组顺序正确,可以使用秩

r1a = RANK r1 BY * DENSE;
r2a = RANK r2 BY * DENSE;

r1r2 = JOIN r1a BY $0, r2a BY $0;

另一种选择是使用:交叉()

注意:从文档中提取:CROSS是一项昂贵的操作,应谨慎使用

清管器脚本:

R1 = LOAD 'a.csv'  USING  PigStorage(',') AS (f1:chararray,f2:chararray);
R2 = LOAD 'b.csv'  USING  PigStorage(',') AS (f3:chararray,f4:chararray);

R3 = CROSS R1,R2;

R4 = FOREACH R3 GENERATE f2,f4;

DUMP R4;
f1_value,f2_value
(f2_value,f4_value)
输入:

R1 = LOAD 'a.csv'  USING  PigStorage(',') AS (f1:chararray,f2:chararray);
R2 = LOAD 'b.csv'  USING  PigStorage(',') AS (f3:chararray,f4:chararray);

R3 = CROSS R1,R2;

R4 = FOREACH R3 GENERATE f2,f4;

DUMP R4;
f1_value,f2_value
(f2_value,f4_value)
a.csv:

R1 = LOAD 'a.csv'  USING  PigStorage(',') AS (f1:chararray,f2:chararray);
R2 = LOAD 'b.csv'  USING  PigStorage(',') AS (f3:chararray,f4:chararray);

R3 = CROSS R1,R2;

R4 = FOREACH R3 GENERATE f2,f4;

DUMP R4;
f1_value,f2_value
(f2_value,f4_value)
b.csv

f3_Value,f4_value
输出:转储R4:

R1 = LOAD 'a.csv'  USING  PigStorage(',') AS (f1:chararray,f2:chararray);
R2 = LOAD 'b.csv'  USING  PigStorage(',') AS (f3:chararray,f4:chararray);

R3 = CROSS R1,R2;

R4 = FOREACH R3 GENERATE f2,f4;

DUMP R4;
f1_value,f2_value
(f2_value,f4_value)

连接不应该是$0吗?交叉不起作用,因为您不希望完全交叉连接,而只希望连接具有相同行号的行。