Hadoop 清管器中的嵌套过滤器
我想在Pig中执行嵌套的filter语句。例如: 查询:Hadoop 清管器中的嵌套过滤器,hadoop,apache-pig,Hadoop,Apache Pig,我想在Pig中执行嵌套的filter语句。例如: 查询: select trim(udc1.drky) drky, trim(udc1.drsy) drsy, trim(udc1.drrt) drrt, trim(udc1.drdl01) drld01, 'Fixed' as AssetType from f0005 udc1 where trim(udc1.drsy) = '12' and trim(udc1.drrt) = 'C2' and trim(u
select trim(udc1.drky) drky,
trim(udc1.drsy) drsy,
trim(udc1.drrt) drrt,
trim(udc1.drdl01) drld01,
'Fixed' as AssetType
from f0005 udc1
where trim(udc1.drsy) = '12'
and trim(udc1.drrt) = 'C2'
and trim(udc1.drky) not in (
select trim(drky)
from f0005
where trim(drsy) = '57' and trim(drrt) = 'AC'
)
我需要将上面的查询转换为Pig脚本。但是,我不知道如何从内部查询获取过滤器并将其与外部查询关联。我可以编写一个Pig UDF作为最后一个选项,但更愿意用原生Pig实现一个解决方案
请帮我解决上述问题。假设下面是您的输入 输入按照的布局
drky, drsy, drtt, drld01
1,57,AC,999
2,57,AC,899
2,12,C2,799
1,12,C2,699
4,57,BC,990
5,12,C3,998
6,12,C2,997
根据您的查询,预期输出为
6,12,C2,997
在Pig中,您可以通过连接来实现这一点。请查看下面的代码
records = LOAD '/user/user/inputfiles/assets.txt' USING PigStorage(',') AS(drky:chararray,drsy:chararray,drtt:chararray,drld01:chararray);
records_filter = FILTER records BY drsy == '57' AND drtt == 'AC';
records_each = FOREACH records_filter GENERATE drky as drky_temp;
records_join = JOIN records BY drky LEFT OUTER, records_each BY drky_temp;
records_join_filter = FILTER records_join BY drky_temp is null and drsy == '12' AND drtt == 'C2';
records_output = FOREACH records_join_filter GENERATE drky, drsy, drtt, drld01, 'FIXED' AS asset_type;
dump records_output;
根据上述清管器脚本输出
6,12,C2,997,FIXED