Filter 使用piglatin按整数列表筛选列表
我的列表如下所示:lista.csv:Filter 使用piglatin按整数列表筛选列表,filter,apache-pig,Filter,Apache Pig,我的列表如下所示:lista.csv: client-id priority client-start assignment 12345 1 1250125125 13 1246 3 1250122156 27 12616 1 1250122351 3 ... 我还有另一个列表,看起来像矢量listb.csv: 125125 124214 12
client-id priority client-start assignment
12345 1 1250125125 13
1246 3 1250122156 27
12616 1 1250122351 3
...
我还有另一个列表,看起来像矢量listb.csv:
125125
124214
1246
125
...
我要做的是筛选所有客户机的列表,我也可以在listb中找到它们的ID
我尝试过类似的方法,但不起作用:
raw = LOAD 'lista.csv' USING PigStorage('\t') AS (client-id: int, priority:
int, client-start: int, assignment: int);
s4q = LOAD 'listb.csv' USING PigStorage('\t') AS (survs4id: int);
s4id = FOREACH s4q {
dd = FILTER raw by (client-id == s4q);
GENERATE dd;
}
DUMP dd;
有没有办法解决这个问题?将这两个关系连接起来,只获取匹配的记录。这将用作过滤器
raw = LOAD 'lista.csv' USING PigStorage('\t') AS (client-id: int, priority: int, client-start: int, assignment: int);
s4q = LOAD 'listb.csv' USING PigStorage('\t') AS (survs4id: int);
s4id = JOIN raw BY client-id,s4q BY survs4id;
dd = FOREACH s4id GENERATE s4id.$0,s4id.$1,s4id.$2,s4id.$3;
DUMP dd;