Apache pig Pig:提取列不区分的记录

Apache pig Pig:提取列不区分的记录,apache-pig,Apache Pig,我想提取列不区分的记录,如何实现它 例如输入: (user1, value1, value2) (user1, value3, value4) (user2, value5, value6) (user3, value7, value8) (user4, value9, value10) (user4, value11, value12) 提取具有第1列重复值的记录后,输出为: (user1, value1, value2) (user1, value3, value4) (user4, va

我想提取列不区分的记录,如何实现它

例如输入:

(user1, value1, value2)
(user1, value3, value4)
(user2, value5, value6)
(user3, value7, value8)
(user4, value9, value10)
(user4, value11, value12)
提取具有第1列重复值的记录后,输出为:

(user1, value1, value2)
(user1, value3, value4)
(user4, value9, value10)
(user4, value11, value12)

提前多谢

请让我知道这是否适合您。出于测试目的,我使用value1和value2作为字符,但在实际代码中,将value1和value2更改为int或long

input.txt
user1,value1,value2
user1,value3,value4
user2,value5,value6
user3,value7,value8
user4,value9,value10
user4,value11,value12

PigScript
A = LOAD 'input.txt' USINg PigStorage(',') AS (user:chararray,value1:chararray,value2:chararray);
B = GROUP A BY user;
C = FOREACH B  GENERATE FLATTEN(A),COUNT(A) AS cnt;
D = FILTER C BY cnt >1;
E = FOREACH D GENERATE A::user,A::value1,A::value2;
DUMP E;

Output:
(user1,value1,value2)
(user1,value3,value4)
(user4,value9,value10)
(user4,value11,value12)

@丹珠,如果我的回答有帮助,请将这个问题标记为已回答。