Apache pig 如何在包上分组

Apache pig 如何在包上分组,apache-pig,Apache Pig,我尝试了以下方法- C = FOREACH A GENERATE dataMap#'$key_name' AS C_ID, dataMap#'$key_name2' AS methodName, pool,time; D = GROUP C BY (C_ID); E = FOREACH D { sorted = order C by time desc; GENERATE group,C.methodName AS Flow ; }; F = GROUP E BY (Flow); G = FOR

我尝试了以下方法-

C = FOREACH A GENERATE dataMap#'$key_name' AS C_ID, dataMap#'$key_name2' AS methodName, pool,time;
D = GROUP C BY (C_ID);
E = FOREACH D {
sorted = order C by time desc;
GENERATE group,C.methodName AS Flow ;
};
F = GROUP E BY (Flow);
G = FOREACH F {
GENERATE group,COUNT(E) AS FlowKount ;
};
STORE G INTO '$output' USING PigStorage();
但我得到错误-使用包作为密钥不受支持

与上述程序中E对应的数据-

c1 {(m1), (m2), (m3) }
c2 {(m1), (m2), (m3) }
c3 {(m2), (m1), (m3) }
c4 {(m1), (m2), (m3) }
c5 {(m2), (m1), (m3) }
我需要输出为-

{(m1), (m2), (m3) } {(c1),(c2),(c4)} 3
{(m2), (m1), (m3) } {(c3),(c5)} 2
即-方法、C_ID和计数

它是一种检查具有不同C_ID的包中具有相同方法的重复流的方法


有人能指导如何实现这一点吗?

请发布一个您正在使用的输入数据示例。您能给我们一些虚拟数据吗?482be07c3b{(bool User::c::m1()常量),(bool User::ClassName::m2()常量),(ulong User::ClassName::m3()常量),(ulong User::ClassName::m4()常量),(虚拟空用户_Verify::Requirements::m5(ulong)),(bool User::ClassName::m1()const),(bool User::ClassName::m3()const)}482be02c3b{(bool User::c::m1()const),(bool User::ClassName::m2()const),(ulong User::ClassName::m3()const),(ulong User::ClassName::m4()const),(virtual void User::Verify::Requirements::m5(ulong)),(bool User::ClassName::m1()const),(bool User::ClassName::m3()const)}上面是示例数据,我需要将不同的元组分组,其中第二个字段是一个包,并找到计数…也就是说,我需要找到所有具有相同包内容的元组