Apache pig 扁平化或子串与连接清管器混乱

Apache pig 扁平化或子串与连接清管器混乱,apache-pig,Apache Pig,我有两个数据集,data1和data2 data2包含以下数据 a1:u:11#eve:f:6 a1:u:12#eve:f:6 a1:u2:13#eve:f:3 a1:u1:12#eve:s:6 a1:u1:11#eve:f:6 这里的:和#都是分隔符。我最终生成data2,如下所示: LOAD '$data2' USING PigStorage(':') AS (ad: chararray, a_id: chararray

我有两个数据集,
data1
data2

data2
包含以下数据

a1:u:11#eve:f:6
a1:u:12#eve:f:6
a1:u2:13#eve:f:3
a1:u1:12#eve:s:6
a1:u1:11#eve:f:6
这里的
#
都是分隔符。我最终生成
data2
,如下所示:

LOAD '$data2' USING PigStorage(':') AS
                 (ad: chararray,
                  a_id: chararray,
                  cid_eve1: chararray,
                  name: chararray,
                  len: int);
然后我把第三列一分为二

FOREACH data2 GENERATE
                  ad AS ad,
                  a_id AS a_id,
                  FLATTEN(STRSPLIT(cid_eve1, '#')) AS (cid: int, eve1: chararray),
                  name AS name,
                  len AS len;
现在,当我加入
data2
data1
时,我什么也得不到

我也试过,

FOREACH data2 GENERATE
                  ad AS ad,
                  a_id AS a_id,
                  SUBSTRING(cid_eve1,0,INDEXOF(cid_eve1,'#',0)) AS cid: int,
                  name AS name,
                  len AS len;
加入时也不会返回任何内容。我将加入第三栏,
cid

我甚至为这两种情况转储了
data2
,并看到了输出。这是人们所期望的。但当我将以下文件用作
data2

a1:u:11:eve:f:6
a1:u:12:eve:f:6
a1:u2:13:eve:f:3
a1:u1:12:eve:s:6
a1:u1:11:eve:f:6
并加载为

LOAD '$data2' USING PigStorage(':') AS
             (ad: chararray,
              a_id: chararray,
              cid: int,
              eve1: chararray,
              name: chararray,
              len: int);
然后连接返回正确的结果。我不知道为什么会这样。有人能帮忙或给点建议吗

data1
,第二列(
$1
)是
a_id
,最后一列是
cid
。他们两人都参加

1,u,true,true,4,1,1,1,1,1,11,21,31,11
1,u,true,true,4,1,1,1,1,1,11,21,32,11
1,u,true,true,4,1,1,1,1,1,11,21,33,11
1,u,true,true,4,1,1,1,1,1,11,21,31,11
1,u,true,true,4,1,1,1,1,1,11,21,32,11
1,u,true,true,4,1,1,1,1,1,11,21,33,11
2,u,true,true,4,1,1,1,1,1,12,22,34,12
2,u,true,true,4,1,1,1,1,1,13,22,35,13
2,u1,true,false,4,1,1,1,1,0,12,22,34,12
2,u1,true,false,4,1,1,1,1,0,13,22,35,13
2,u1,true,true,9,1,1,1,1,1,12,22,34,12
2,u1,true,true,9,1,1,1,1,1,13,22,35,13
3,u,false,false,4,1,0,1,0,0,14,24,31,14
3,u,false,false,4,1,0,1,0,0,11,22,31,11
4,u,true,NULL,0,1,1,0,0,0,11,22,33,11
4,u1,false,NULL,0,1,0,0,0,0,11,22,33,11
2,u,true,true,4,1,1,1,1,1,12,22,34,12
2,u,true,true,4,1,1,1,1,1,13,22,35,13
2,u2,true,true,7,1,1,1,1,1,12,22,34,12
2,u2,true,true,7,1,1,1,1,1,13,22,35,13

我找到了答案。问题在于数据类型。我试图把
chararray
读入
int
,但没有打字

当我把它改成

FOREACH data2 GENERATE
              ad AS ad,
              a_id AS a_id,
              (int)SUBSTRING(cid_eve1,0,INDEXOF(cid_eve1,'#',0)) AS cid,
              name AS name,
              len AS len;
成功了