Hadoop 如何使用PIG计算唯一用户数

Hadoop 如何使用PIG计算唯一用户数,hadoop,apache-pig,Hadoop,Apache Pig,下面的代码并不完全返回我试图计算的内容;唯一用户的数量。有什么想法吗 data = LOAD 'input_initial' AS (user_id,item_id,rating,timestamp); data = FOREACH data GENERATE user_id,item_id; STORE data INTO 'input_final'; data_users = FOREACH data GENERATE user_id; group_users = GROUP data_us

下面的代码并不完全返回我试图计算的内容;唯一用户的数量。有什么想法吗

data = LOAD 'input_initial' AS (user_id,item_id,rating,timestamp);
data = FOREACH data GENERATE user_id,item_id;
STORE data INTO 'input_final';
data_users = FOREACH data GENERATE user_id;
group_users = GROUP data_users BY user_id;
count_users = FOREACH group_users GENERATE COUNT(data_users);
STORE count_users INTO 'count_users';

您需要修改最终的组操作,以对“全部”而不是单个字段执行操作:

group_users = GROUP data_users BY user_id;
grp_all = GROUP group_users ALL;
count_users = FOREACH grp_all GENERATE COUNT(group_users);

恐怕不行。您是否已测试并成功运行?