Mysql 在ApachePig中计算不同的项目

Mysql 在ApachePig中计算不同的项目,mysql,hadoop,apache-pig,Mysql,Hadoop,Apache Pig,我有一个带有模式的表用户表 |Column 1 | USER ID |int| |Column 2 |EMAIL|chararray| |Column 3 |LANGUAGE |chararray| |Column 4 |LOCATION |chararray| 和具有模式的事务表 |Column 1 | ID |int| |Column 2 |PRODUCT|int| |Column 3 |USER ID |int| |Column 4 |PURCHASE AMOUNT |dou

我有一个带有模式的表用户表

|Column 1 | USER ID |int|

|Column 2 |EMAIL|chararray|

|Column 3 |LANGUAGE |chararray|

|Column 4 |LOCATION |chararray|
和具有模式的事务表

|Column 1 | ID |int|

|Column 2 |PRODUCT|int|

|Column 3 |USER ID |int|

|Column 4 |PURCHASE AMOUNT |double|

|Coulmn 5 |DESCRIPTION |chararray|
问题…找出每个产品在不同位置的数量

我写了一个猪脚本如下:-

user = LOAD '/tmp/users.txt' USING PigStorage ('    ')
AS (USER_ID:int, EMAIL:chararray, LANGUAGE:chararray, LOCATION:chararray);

transaction = LOAD '/tmp/transaction.txt' USING PigStorage ('   ')
AS (ID:int, PRODUCT:int,USER_ID:int, PURCHASE_AMOUNT:double,DESCRIPTION:chararray);

u1 = JOIN user by USER_ID, transaction by USER_ID;

u2 = GROUP u1 by LOCATION;

Result = FOREACH u2 GENERATE COUNT(u2.PRODUCT);

DUMP Result;
错误--错误org.apache.pig.tools.grunt.grunt-错误1200:pig脚本无法分析: 无效的标量投影:u2

这就是我得到的结果。

在脚本中Result=FOREACH u2 GENERATE COUNTu2.PRODUCT;这是错误的。分组操作完成后,数据结构发生变化。你可以通过描述u2找到这一点。请尝试以下操作,我假设txt文件中的字段以逗号分隔:

user = LOAD 'user.txt' USING PigStorage (',') AS (USER_ID:int, EMAIL:chararray, LANGUAGE:chararray, LOCATION:chararray);

transaction = LOAD 'transaction.txt' USING PigStorage (',') AS (ID:int, PRODUCT:int,USER_ID:int, PURCHASE_AMOUNT:double,DESCRIPTION:chararray);

u1 = JOIN user by USER_ID, transaction by USER_ID;

u2 = GROUP u1 by (LOCATION,PRODUCT);

Result = FOREACH u2 GENERATE FLATTEN(group) as (LOCATION,PRODUCT), COUNT($1);

DUMP Result;