Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop pig中查询的组内前三名记录_Hadoop_Apache Pig - Fatal编程技术网

Hadoop pig中查询的组内前三名记录

Hadoop pig中查询的组内前三名记录,hadoop,apache-pig,Hadoop,Apache Pig,我的问题的示例数据: 1 12 1234 2 12 1233 1 13 5555 1 15 4444 2 34 2222 7 89 1111 Field Description : col1 cust_id ,col2 zip_code , col 3 transaction_id. Using pig scripting i need to

我的问题的示例数据:

1       12      1234

2       12      1233

1       13      5555

1       15      4444

2       34      2222

7       89      1111




Field Description :
col1 cust_id ,col2 zip_code , col 3 transaction_id.

Using pig scripting  i need to find the below question :

for each cust_id i need to find the zip code mostly used for last 3 transactions .
Approach I used so far : 



1) Group records with cust_id :

(1,{(1,12,1234),(1,13,5555),(1,15,4444),(1,12,3333),(1,13,2323),(1,13,3434),(1,13,5755),(1,18,4424),(1,12,3383),(1,13,2823)})
(2,{(2,34,2222),(2,12,1233),(2,34,6666),(2,34,6666),(2,34,2422)})
(6,{(6,14,2312),(6,15,8888),(6,14,4634),(6,14,2712),(6,15,8288)})
(7,{(7,45,4244),(7,89,1111),(7,45,4544),(7,89,1121)})
2) 对其进行排序,并将其限制在最近3笔交易中

Using nested foreach i have sorted by transaction id and limit that to 3 
nested = foreach group_by { sor = order zip by $2 desc ; limi = limit sor 3 ; generate limi; };

After grouping data is :

({(1,12,1234),(1,13,2323),(1,13,2823)})
({(2,12,1233),(2,34,2222),(2,34,2422)})
({(6,14,2312),(6,14,2712),(6,14,4634)})
({(7,89,1111),(7,89,1121),(7,45,4244)})
为什么我的上述数据没有按降序排序

即使按升序排列,现在我如何找到最近3笔交易中使用最多的邮政编码

Result should be  
1) 13 
2) 34 
3) 14
4) 89
你能试试这个吗

PigScript:

A = LOAD 'input.txt' USING PigStorage(',') AS(CustomerId:int,ZipCode:int,TransactionId:int);
B = GROUP A BY CustomerId;
C = FOREACH B {
                 SortTxnId = ORDER A BY $2 DESC;
                 TxnIdLimit = LIMIT SortTxnId 3;
                 GENERATE group,TxnIdLimit;
              }
D = FOREACH C GENERATE FLATTEN($1);
E = GROUP  D BY ($0,$1);
F = FOREACH E GENERATE group,COUNT(D);
G = GROUP F BY group.$0;
I = FOREACH G {
                 SortZipCode = ORDER F BY $1 DESC;
                 ZipCodeLimit = LIMIT SortZipCode 1;
                 GENERATE FLATTEN(ZipCodeLimit.group);
              }
J = FOREACH I GENERATE FLATTEN($0.TxnIdLimit::ZipCode);
DUMP J;

Output:
(13)
(34)
(14)
(89)

input.txt
1,12,1234
1,13,5555
1,15,4444
1,12,3333
1,13,5755
1,18,4424
2,34,2222
2,12,1233
2,33,6666
2,34,6666
2,34,2422
6,14,2312
6,15,8888
6,14,4634
6,14,2712
7,45,4244
7,89,1111
7,89,3111
7,89,1121