Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop 清管器中的过滤记录_Hadoop_Apache Pig - Fatal编程技术网

Hadoop 清管器中的过滤记录

Hadoop 清管器中的过滤记录,hadoop,apache-pig,Hadoop,Apache Pig,以下是数据 col1,col2,col3,col4,col5 ------------------------ 10,20,30,40,dollar 20,30,40,50,dollar 20,30,10,50,dollar 61,62,63,64,dollar 61,62,63,64,pound col1、col2、col3将形成唯一键的组合。用例是基于col5过滤数据。 对于唯一的键组合,我们需要过滤col5值为“dollar”的记录,前提是相同的组合具有“pound”值 预期产量为 co

以下是数据

col1,col2,col3,col4,col5
------------------------
10,20,30,40,dollar
20,30,40,50,dollar
20,30,10,50,dollar
61,62,63,64,dollar
61,62,63,64,pound
col1、col2、col3将形成唯一键的组合。用例是基于col5过滤数据。 对于唯一的键组合,我们需要过滤col5值为“dollar”的记录,前提是相同的组合具有“pound”值

预期产量为

col1,col2,col3,col4,col5
------------------------
10,20,30,40,dollar
20,30,40,50,dollar
20,30,10,50,dollar
61,62,63,64,pound
如何进一步进行,因为在猪一样的蜂巢中没有特殊操作员

A = load 'test1.csv' using PigStorage(',') as (col1:int,col2:int,col3:int,col4:int,col5:chararray);
B = FILTER A BY col5 == 'pound';

获取所有带“pound”的记录,然后获取所有与col5中带“pound”的id组合不匹配的带“dollar”的记录。最后,把他们嫁出去。。。工会

B = FILTER A BY col5 == 'pound';
C = JOIN A BY (col1,col2,col3) LEFT OUTER,B BY (col1,col2,col3);
D = FILTER C BY (B::col1 is null);
E = FOREACH D GENERATE A::col1,A::col2,A::col3,A::col4,A::col5;
F = UNION B,E;
DUMP F;
输出