Hadoop Pig,如何在加入和分组后引用字段

Hadoop Pig,如何在加入和分组后引用字段,hadoop,apache-pig,Hadoop,Apache Pig,我在Pig中有这段代码(win、request和response只是直接从文件系统加载的表): 基本上,我想在加入和分组后对投标价格求和,但我得到一个错误: Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast. 我的猜测是,我没有正确地指的是win.bid\u price当执行多个联接时,

我在Pig中有这段代码(win、request和response只是直接从文件系统加载的表):

基本上,我想在加入和分组后对投标价格求和,但我得到一个错误:

Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.

我的猜测是,我没有正确地指的是
win.bid\u price

当执行多个联接时,我建议为您的字段使用唯一标识符(例如,bid\u id)。或者,您也可以使用“::”,但这可能会变得非常脏

wins = LOAD '/user/hadoop/rtb/wins' USING PigStorage(',') AS (f1_w:int, f2_w:int,  f3_w:chararray);
reqs = LOAD '/user/hadoop/rtb/reqs' USING PigStorage(',') AS (f1_r:int, f2_r:int, f3_r:chararray);
resps = LOAD '/user/hadoop/rtb/resps' USING PigStorage(',') AS (f1_rp:int, f2_rp:int, f3_rp:chararray);

wins_reqs = JOIN wins BY f1_w, reqs BY f1_r;
wins_reqs_reps = JOIN wins_reqs BY f1_r, resps BY f1_rp;

win_group = GROUP wins_reqs_reps BY (f3_w);

win_sum = FOREACH win_group GENERATE group, SUM(wins_reqs_reps.f2_w);
wins = LOAD '/user/hadoop/rtb/wins' USING PigStorage(',') AS (f1_w:int, f2_w:int,  f3_w:chararray);
reqs = LOAD '/user/hadoop/rtb/reqs' USING PigStorage(',') AS (f1_r:int, f2_r:int, f3_r:chararray);
resps = LOAD '/user/hadoop/rtb/resps' USING PigStorage(',') AS (f1_rp:int, f2_rp:int, f3_rp:chararray);

wins_reqs = JOIN wins BY f1_w, reqs BY f1_r;
wins_reqs_reps = JOIN wins_reqs BY f1_r, resps BY f1_rp;

win_group = GROUP wins_reqs_reps BY (f3_w);

win_sum = FOREACH win_group GENERATE group, SUM(wins_reqs_reps.f2_w);