Hadoop Apache Pig-按错误排序java.lang.ClassCastException:java.lang.String不能转换为java.lang.Integer

Hadoop Apache Pig-按错误排序java.lang.ClassCastException:java.lang.String不能转换为java.lang.Integer,hadoop,apache-pig,Hadoop,Apache Pig,我正在用阿帕奇猪做订单。它失败,错误为java.lang.ClassCastException:java.lang.String不能转换为java.lang.Integer 附件是代码的一部分 movies = LOAD 'ml-1m/movies.dat' Using PigStorage('\t') as (movieblob:chararray); A = FOREACH movies GENERATE STRSPLIT (movieblob,'::',3) as movie1:(id

我正在用阿帕奇猪做订单。它失败,错误为java.lang.ClassCastException:java.lang.String不能转换为java.lang.Integer

附件是代码的一部分

  movies = LOAD 'ml-1m/movies.dat' Using PigStorage('\t') as (movieblob:chararray);
A = FOREACH movies GENERATE STRSPLIT (movieblob,'::',3) as movie1:(id:int,title:chararray,genre:chararray);
B = FOREACH A GENERATE movie1.id, movie1.title, TOKENIZE (movie1.genre,'|');
C = FOREACH B GENERATE id, title, FLATTEN ($2) as genre;
C_FLAT = FOREACH C GENERATE FLATTEN ($0) as id:int, FLATTEN ($1) as
title:chararray, FLATTEN ($2) as genre:chararray;
\de C_FLAT 
C_FLAT: {id: int,title: chararray,genre: chararray}
C_FLAT_ORDER = ORDER C_FLAT BY id;
\d C_FLAT_ORDER
2017-01-20 16:17:53,727 [main] ERROR org.apache.pig.tools.grunt.Grunt -   
ERROR 1066: Unable to open iterator for alias C_FLAT_ORDER. Backend error : 
java.lang.ClassCastException: java.lang.String cannot be cast to  
java.lang.Integer
我尝试过的解决方法是

store C_FLAT INTO 'ml-1m/movies/output' Using PigStorage('|');
m1 = LOAD 'ml-1m/movies/output' Using PigStorage('|') as
(id:int,title:chararray,genre:chararray);
order_m1 = ORDER m1 BY id;
grunt> \de m1
m1: {id: int,title: chararray,genre: chararray}
grunt> \d order_m1 
输出的几行

(3945,Digimon: The Movie (2000),Children's)
(3946,Get Carter (2000),Action)
(3946,Get Carter (2000),Drama)
(3946,Get Carter (2000),Thriller)
(3947,Get Carter (1971),Thriller)
(3948,Meet the Parents (2000),Comedy)
(3949,Requiem for a Dream (2000),Drama)
我试过了

C1 = FOREACH C GENERATE (int)id, title,genre;
按id订购,也失败了。任何帮助都将不胜感激

使用输入文件中的一些数据-“ml-1m/movies.dat”

 3945::Digimon: The Movie (2000)::Adventure|Animation|Children's
 3946::Get Carter (2000)::Action|Drama|Thriller
 3947::Get Carter (1971)::Thriller
 3948::Meet the Parents (2000)::Comedy
 3949::Requiem for a Dream (2000)::Drama
 3950::Tigerland (2000)::Drama
 3951::Two Family House (2000)::Drama
 3952::Contender, The (2000)::Drama|Thriller

如果您只是想整理数据,这就是您所需要的

movies = LOAD '/tmp/ml-1m/movies.dat' Using PigStorage('\t') as (movieblob:chararray);
A = FOREACH movies GENERATE FLATTEN(STRSPLIT(movieblob, '::', 3)) AS (id:int,title:chararray,genres:chararray);
B = ORDER A BY id;

我认为你的问题与试图得到这个有关

(3946,Get Carter (2000),Action|Drama|Thriller)
进入这个

(3946,Get Carter (2000),Action)
(3946,Get Carter (2000),Drama)
(3946,Get Carter (2000),Thriller)
在这种情况下,引用

\d订购的电影
制作

(3945,Digimon: The Movie (2000),Adventure)
(3945,Digimon: The Movie (2000),Animation)
(3945,Digimon: The Movie (2000),Children's)
(3946,Get Carter (2000),Action)
(3946,Get Carter (2000),Drama)
(3946,Get Carter (2000),Thriller)
(3947,Get Carter (1971),Thriller)
(3948,Meet the Parents (2000),Comedy)
(3949,Requiem for a Dream (2000),Drama)
(3950,Tigerland (2000),Drama)
(3951,Two Family House (2000),Drama)
(3952,Contender, The (2000),Drama)
(3952,Contender, The (2000),Thriller)

我仍然认为Spark数据帧会更有用

请显示原始文件谢谢您的回复“::”没有unicode。它需要使用2个冒号“\u003A”。根据我读到的,存储格式只能包含一个字符。PigStorage(“::”)fails“pig脚本未能验证:java.lang.RuntimeException:无法使用参数“[::]”实例化“PigStorage”,因此我使用STRSPLIT将其拆分为3个有效字段。我所面临的挑战是为了使我的工作更顺利。不知怎的,演员阵容没有发挥好作用。我也在试猪,不是为了生产。真的很感谢你花时间看它。
(3945,Digimon: The Movie (2000),Adventure)
(3945,Digimon: The Movie (2000),Animation)
(3945,Digimon: The Movie (2000),Children's)
(3946,Get Carter (2000),Action)
(3946,Get Carter (2000),Drama)
(3946,Get Carter (2000),Thriller)
(3947,Get Carter (1971),Thriller)
(3948,Meet the Parents (2000),Comedy)
(3949,Requiem for a Dream (2000),Drama)
(3950,Tigerland (2000),Drama)
(3951,Two Family House (2000),Drama)
(3952,Contender, The (2000),Drama)
(3952,Contender, The (2000),Thriller)