Hadoop 检查清管器内的每个袋是否为空

Hadoop 检查清管器内的每个袋是否为空,hadoop,apache-pig,Hadoop,Apache Pig,我正在加入3个表,在每个表中,我需要检查ReadStagingData包是否为空。 下面是代码 ReadStagingData = Load 'Staging_data.csv' Using PigStorage(',') As (PL_Posn_id:int,Brok_org_dly:double,Brok_org_ptd:double); ReadPriorData = Load 'ptd.csv' Using PigStorage(',') As (PL_Posn_id:int,

我正在加入3个表,在每个表中,我需要检查ReadStagingData包是否为空。 下面是代码

ReadStagingData = Load 'Staging_data.csv' Using PigStorage(',') As     (PL_Posn_id:int,Brok_org_dly:double,Brok_org_ptd:double);

ReadPriorData = Load 'ptd.csv' Using PigStorage(',') As (PL_Posn_id:int,Brok_org_ptd:double);

ReadPriorFunctional = Load 'Functional.csv' Using PigStorage(',') AS (PL_Posn_id:int,Brok_fun_ptd:double,Brok_fun_ltd:double);

JoinDS1 = JOIN ReadPriorData BY PL_Posn_id,ReadPriorFunctional BY PL_Posn_id;

JoinDS2 = JOIN ReadStagingData by PL_Posn_id Left OUTER,JoinDS1 BY      ReadPriorData::PL_Posn_id;

X = Foreach JoinDS2 {
    **test = (NOT(IsEmpty(ReadStagingData))); //Error on this line**
    GENERATE test,ReadStagingData::PL_Posn_id,
    ReadStagingData::Brok_org_dly,
   (ReadStagingData::Brok_org_ptd is not null ? ReadStagingData::Brok_org_ptd:ReadPriorData::Brok_org_ptd+ReadStagingData::Brok_org_dly);
};

Dump X;

当我运行上述代码时,我收到错误无效投影ReadStagingData。请帮助我了解您的关系
X
ReadStagingData
不是一个包。符号
ReadStagingData::Brok_org_ _dly
并不表示从包中投影;它是一个顶级字段,在
连接后以这种方式命名,以确保每个字段的命名唯一。因此,
ReadStagingData
只是一个前缀

此外,我不确定您为什么要检查它——因为您正在进行
左外
连接,
X
中不会有任何记录,
ReadStagingData
中没有相应的记录。如果你在做一个
右外
连接,那就不一样了

如果您打算执行
右外部
联接,并且希望检查
ReadStagingData
中的字段是否为
NULL
,我会执行以下操作:

rsdIsNull = ReadStagingData::PL_Posn_id IS NULL;