Filter 清管器10过滤器特征不为空,不为';行不通

Filter 清管器10过滤器特征不为空,不为';行不通,filter,apache-pig,Filter,Apache Pig,我是新来的猪,我正在玩它,来到一个路障 假设我有以下几点: dump test; (1,2014-04-08 12:09:23.0) (2,2014-04-08 12:09:23.0) (3,null) (4,null) 我想过滤“test”以删除空值,所以我会这样做: filter_test = filter test by test.column2 is not null; (1,2014-04-08 12:09:23.0) (2,2014-04-08 12:09:23.0) 给我这

我是新来的猪,我正在玩它,来到一个路障

假设我有以下几点:

dump test;

(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)
(3,null)
(4,null)
我想过滤“test”以删除空值,所以我会这样做:

filter_test = filter test by test.column2 is not null;
(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)
给我这样的东西:

filter_test = filter test by test.column2 is not null;
(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)
但它返回相同的东西。它不会删除空行

我使用的是Pig 10,日期列是Charray类型


谢谢您的帮助。

您的column2没有空值,这是一个字符。请参见实空值和空为字符的示例

示例1:null作为字符
input.txt

1,2014-04-08 12:09:23.0
2,2014-04-08 12:09:23.0
3,null
4,null
清管器:

A = LOAD 'input.txt' USING PigStorage(',') AS (f1:int,f2:chararray);
B = FILTER A BY f2!='null';
DUMP B;
(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)
A = LOAD 'input.txt' USING PigStorage(',') AS (f1:int,f2:chararray);
B = FILTER A BY f2 is not null;
DUMP B;
(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)
输出:

A = LOAD 'input.txt' USING PigStorage(',') AS (f1:int,f2:chararray);
B = FILTER A BY f2!='null';
DUMP B;
(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)
A = LOAD 'input.txt' USING PigStorage(',') AS (f1:int,f2:chararray);
B = FILTER A BY f2 is not null;
DUMP B;
(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)
示例2:实际空值
input.txt

1,2014-04-08 12:09:23.0
2,2014-04-08 12:09:23.0
3,
4,
清管器:

A = LOAD 'input.txt' USING PigStorage(',') AS (f1:int,f2:chararray);
B = FILTER A BY f2!='null';
DUMP B;
(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)
A = LOAD 'input.txt' USING PigStorage(',') AS (f1:int,f2:chararray);
B = FILTER A BY f2 is not null;
DUMP B;
(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)
输出:

A = LOAD 'input.txt' USING PigStorage(',') AS (f1:int,f2:chararray);
B = FILTER A BY f2!='null';
DUMP B;
(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)
A = LOAD 'input.txt' USING PigStorage(',') AS (f1:int,f2:chararray);
B = FILTER A BY f2 is not null;
DUMP B;
(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)

令人惊叹的!成功了!我看到NULL将是一个空包/元组,我真的认为我尝试了你的方法。我想我没有。谢谢