Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/backbone.js/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop 我怎样才能按一列的格式分组;a、 b、c“;从蜂箱表加载?_Hadoop_Hive_Apache Pig - Fatal编程技术网

Hadoop 我怎样才能按一列的格式分组;a、 b、c“;从蜂箱表加载?

Hadoop 我怎样才能按一列的格式分组;a、 b、c“;从蜂箱表加载?,hadoop,hive,apache-pig,Hadoop,Hive,Apache Pig,我肯定是个新手,我打赌答案很简单。我的代码如下所示: A = load 'table1' USING org.apache.hive.hcatalog.pig.HCatLoader() as (userid1: chararray, location: chararray, age: int); 配置单元中的位置列如下所示:城市、州、国家 这就是我正在做的: B= GROUP A BY location; C= FOREACH B GENERATE group as location,

我肯定是个新手,我打赌答案很简单。我的代码如下所示:

A = load 'table1' USING org.apache.hive.hcatalog.pig.HCatLoader() as (userid1: chararray, location: chararray, age: int);
配置单元中的位置列如下所示:
城市、州、国家

这就是我正在做的:

B= GROUP A BY location;
C= FOREACH B GENERATE
    group as location,
    SUM(rating) as RatingSum,
    AVG(rating) as RatingAverage,
    MIN(rating) as RatingMin,
    MAX(rating) as RatingMax,
    COUNT(rating) as RecNum;
C不起作用,可能是因为这是B的输出:

(, ,,{(56072,, ,,,56072,1885171218,3),(104462,, ,,,104462,8486054060,7),(46927,, ,,47,46927,0749300523,0),(46927,, ,,47,46927,0749300515,0),(64139,, ,,,64139,8422665662,0),(112345,, ,,39,112345,0375727345,7),(151458,, ,,,151458,1551667959,0),(64139,, ,,,64139,8422676095,6)})
(ny, ,,{(175362,ny, ,,,175362,0446604844,10)})
(, usa,,{(223496,, usa,,,223496,0714838500,7)})
(gap, ,,{(211944,gap, ,,42,211944,044023722X,9),(211944,gap, ,,42,211944,1577486445,8),(211944,gap, ,,42,211944,0821767089,9),(211944,gap, ,,42,211944,0804106304,0),(211944,gap, ,,42,211944,0743412621,9),(211944,gap, ,,42,211944,0505521474,7),(211944,gap, ,,42,211944,0440236673,9),(211944,gap, ,,42,211944,0440225701,0),(211944,gap, ,,42,211944,044022165X,0),(211944,gap, ,,42,211944,0440214041,0),(211944,gap, ,,42,211944,0440213525,0),(211944,gap, ,,42,211944,044020111X,0),(211944,gap, ,,42,211944,0425151867,0),(211944,gap, ,,42,211944,0385472951,8),(211944,gap, ,,42,211944,0373832257,7),(211944,gap, ,,42,211944,0373471521,5),(211944,gap, ,,42,211944,0373291574,6),(211944,gap, ,,42,211944,0373291566,7),(211944,gap, ,,42,211944,0373201532,7),(211944,gap, ,,42,211944,0373151861,8),(211944,gap, ,,42,211944,158660242X,0)})
(n/a, ,,{(169489,n/a, ,,,169489,0618150730,6)})
我确信我需要更改load语句或在分组之前添加步骤,或者两者都需要,但我迷路了。请帮忙。如您所知,原始配置单元表是从csv创建的,带有分号作为分隔符。如果有帮助,下面是我用来在配置单元中创建表的代码:

     create table table1 (UserID string, Location string, Age INT) 
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
    WITH SERDEPROPERTIES ("separatorChar" = '\u0059') STORED AS TEXTFILE
 tblproperties ("skip.header.line.count"="1");

假设rating是在A中加载的列之一,则必须在C中使用A.rating

C= FOREACH B GENERATE
    group as location,
    SUM(A.rating) as RatingSum,
    AVG(A.rating) as RatingAverage,
    MIN(A.rating) as RatingMin,
    MAX(A.rating) as RatingMax,
    COUNT(A.rating) as RecNum;

评级栏是从哪里来的?您的负载有3个字段,没有一个是额定值!