Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/dart/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache pig pig中多关系连接中前缀的避免_Apache Pig - Fatal编程技术网

Apache pig pig中多关系连接中前缀的避免

Apache pig pig中多关系连接中前缀的避免,apache-pig,Apache Pig,我正在尝试在pig中执行星型模式类型的连接,下面是我的代码。当我用不同的列连接多个关系时,每次都必须在前一个连接的名称前加前缀,以使其工作。我相信应该有更好的方法,我无法通过谷歌搜索找到它。任何指针都会非常有用 i、 我希望避免在列前面加上这样的前缀“H864::H86::hs_8_d::hs_8_desc” hs_8 = LOAD 'hs_8_distinct' USING PigStorage('^') as (hs_8:chararray,hs_8_desc:chararray); hs_

我正在尝试在pig中执行星型模式类型的连接,下面是我的代码。当我用不同的列连接多个关系时,每次都必须在前一个连接的名称前加前缀,以使其工作。我相信应该有更好的方法,我无法通过谷歌搜索找到它。任何指针都会非常有用

i、 我希望避免在列前面加上这样的前缀“H864::H86::hs_8_d::hs_8_desc”

hs_8 = LOAD 'hs_8_distinct' USING PigStorage('^') as (hs_8:chararray,hs_8_desc:chararray);
hs_8_d = FOREACH hs_8 GENERATE SUBSTRING(hs_8,0,2) as hs_2,SUBSTRING(hs_8,0,4) as hs_4,SUBSTRING(hs_8,0,6) as hs_6,hs_8,hs_8_desc;

hs_6_d = LOAD 'hs_6_distinct' USING PigStorage('^') as (hs_6:chararray,hs_6_desc:chararray);
hs_4_d = LOAD 'hs_4_distinct' USING PigStorage('^') as (hs_4:chararray,hs_4_desc:chararray);
hs_2_d = LOAD 'hs_2_distinct' USING PigStorage('^') as (hs_2:chararray,hs_2_desc:chararray);

H86 = JOIN hs_8_d BY hs_6, hs_6_d BY hs_6 USING 'replicated' ;
H864 = JOIN H86 BY hs_8_d::hs_4, hs_4_d BY hs_4 USING 'replicated' ;
H8642 = JOIN H864 BY H86::hs_8_d::hs_2, hs_2_d BY hs_2 USING 'replicated' ;

hs_dim = FOREACH H8642 GENERATE hs_2_d::hs_2,hs_2_d::hs_2_desc,H864::hs_4_d::hs_4,H864::hs_4_d::hs_4_desc,H864::H86::hs_6_d::hs_6,H864::H86::hs_6_d::hs_6_desc,H864::H86::hs_8_d::hs_8,H864::H86::hs_8_d::hs_8_desc;

Pig将始终在字段前面加上
bagname::
,以消除联接后字段的歧义。不幸的是,我认为你无法避免这一点

通过向联接添加额外的foreach,可以稍微简化别名。检查统计数据,这不会给管道增加额外的MR工作。最初的,这将产生4个仅地图作业

例如:


如果你有一包元组,那么s可能会有帮助。

当我有3个以上的关系时,它会变得复杂,我发现很难推导出冗长的前缀,你如何处理这种情况?还是有一种简单的方法来推导前缀?我认为如果我加入20+关系,这可能会非常复杂。我是猪新手。。很想知道如何在猪身上处理这个问题
H86 = foreach (JOIN hs_8_d BY hs_6, hs_6_d BY hs_6 USING 'replicated') generate 
        hs_8_d::hs_2 as x1, 
        hs_8_d::hs_4 as x2, 
        hs_8_d::hs_6 as x3, 
        hs_8_d::hs_8 as x4,
        hs_8_d::hs_8_desc as x5, 
        hs_6_d::hs_6 as x6,
        hs_6_d::hs_6_desc as x7;

H864 = foreach (JOIN H86 BY x2, hs_4_d BY hs_4 USING 'replicated') generate 
          H86::x1 as y1,
          H86::x2 as y2, 
          H86::x3 as y3,
          H86::x4 as y4,
          H86::x5 as y5, 
          H86::x6 as y6, 
          H86::x7 as y7,
          hs_4_d::hs_4 as y8,
          hs_4_d::hs_4_desc as y9;

H8642 = foreach (JOIN H864 BY y1, hs_2_d BY hs_2 USING 'replicated') generate 
          H864::y1 as z1, 
          H864::y2 as z2,
          H864::y3 as z3, 
          H864::y4 as z4, 
          H864::y5 as z5, 
          H864::y6 as z6, 
          H864::y7 as z7,
          H864::y8 as z8, 
          H864::y9 as z9, 
          hs_2_d::hs_2 as z10, 
          hs_2_d::hs_2_desc as z11;

hs_dim = FOREACH H8642 GENERATE z10, z11, z8, z9, z6, z7, z4, z5;