Hadoop 我在pig中使用过滤器时出错,当我转储结果时,它会给出错误

Hadoop 我在pig中使用过滤器时出错,当我转储结果时,它会给出错误,hadoop,apache-pig,Hadoop,Apache Pig,pig中使用的代码为: studentsR = LOAD 'hdfs://quickstart.cloudera:8020/students/students' using PigStorage() as (name:chararray,rollno:int); resultR = LOAD 'hdfs://quickstart.cloudera:8020/students/results' using PigStorage() as (rollno:int,result:chararray);

pig中使用的代码为:

studentsR = LOAD 'hdfs://quickstart.cloudera:8020/students/students' using PigStorage() as (name:chararray,rollno:int);
resultR = LOAD 'hdfs://quickstart.cloudera:8020/students/results' using PigStorage() as (rollno:int,result:chararray);
joniR = JOIN studentsR BY rollno,resultR BY rollno;
filterR = FOREACH joniR GENERATE (studentsR::name,studentsR::rollno,resultR::result) ;
filterRPass = FILTER filterR BY resultR.result == 'pass';
dump filterRPass;
错误如下:

ERROR 0: Scalar has more than one row in the output. 1st : (1,fail), 2nd :(2,fail)

尝试转储并描述每个结果集,以查看所用每个别名的输出

参考:

修改:

dump studentsR
(a,1)
(b,2)
(c,3)

dump resultR
(3,pass)
(2,fail)
(5,pass)

dump joniR
(b,2,2,fail)
(c,3,3,pass)

dump filterR --filterR = FOREACH joniR GENERATE (studentsR::name,studentsR::rollno,resultR::result);
((b,2,fail))
((c,3,pass))

dump filterR --filterR = FOREACH joniR GENERATE studentsR::name,studentsR::rollno,resultR::result;
(b,2,fail)
(c,3,pass)

dump filterRPass; --filterRPass = FILTER filterR BY resultR::result == 'pass';  --or-- filterRPass = FILTER filterR BY $2 == 'pass';
(c,3,pass)
我使用输入文件中的空格作为分隔符,因此使用了PigStorage(“”)

在filterR中,我删除了studentsR::name、studentsR::rollno、resultR::result周围的开始和结束圆括号(),因为dump的输出有额外的圆括号

grunt> filterR = FOREACH joniR GENERATE (studentsR::name,studentsR::rollno,resultR::result);
grunt> describe  filterR;
filterR: {org.apache.pig.builtin.totuple_studentsR::name_100: (studentsR::name: chararray,studentsR::rollno: int,resultR::result: chararray)}
grunt> filterR = FOREACH joniR GENERATE studentsR::name,studentsR::rollno,resultR::result;
grunt> describe  filterR;
filterR: {studentsR::name: chararray,studentsR::rollno: int,resultR::result: chararray}
在fifilterRPass中使用了resultR::result而不是resultR.result

我使用了一组本地文件,并在本地模式下执行pig进行测试

cat students
a 1
b 2
c 3

cat results
3 pass
2 fail
5 pass
转储结果:

dump studentsR
(a,1)
(b,2)
(c,3)

dump resultR
(3,pass)
(2,fail)
(5,pass)

dump joniR
(b,2,2,fail)
(c,3,3,pass)

dump filterR --filterR = FOREACH joniR GENERATE (studentsR::name,studentsR::rollno,resultR::result);
((b,2,fail))
((c,3,pass))

dump filterR --filterR = FOREACH joniR GENERATE studentsR::name,studentsR::rollno,resultR::result;
(b,2,fail)
(c,3,pass)

dump filterRPass; --filterRPass = FILTER filterR BY resultR::result == 'pass';  --or-- filterRPass = FILTER filterR BY $2 == 'pass';
(c,3,pass)

谢谢你的回答。。。我工作得很有魅力。。。。。。我是hadoop和pig的新手,你能解释一下什么时候使用resultR::result,resultR:result,resultR.result……@AnshulBisht据我所知,在加载/分配时我们使用单冒号,在检索/调用数据时我们使用双冒号。