Apache pig 如何在PIG中将字段转换为包和元组?

Apache pig 如何在PIG中将字段转换为包和元组?,apache-pig,Apache Pig,我有一个数据集,它有逗号分隔的值: 10,4,21,9,50,9,4,50 50,78,47,7,4,7,4,50 68,25,43,13,11,68,10,9 ({(10),(4),(21),(9),(50)},{(9),(4),(50)}) ({(50),(78),(45),(7),(4)},{(7),(4),(50)}) ({(68),(25),(43),(13),(11)},{(68),(10),(9)}) grunt> dataset

我有一个数据集,它有逗号分隔的值:

10,4,21,9,50,9,4,50    
50,78,47,7,4,7,4,50    
68,25,43,13,11,68,10,9 
({(10),(4),(21),(9),(50)},{(9),(4),(50)})    
({(50),(78),(45),(7),(4)},{(7),(4),(50)})    
({(68),(25),(43),(13),(11)},{(68),(10),(9)})   
grunt> dataset = load '/user/dataset' Using PigStorage(',') As (bag1:bag{t1:tuple(p1:int, p2:int, p3:int, p4:int, p5:int)}, bag2:bag{t2:tuple(p6:int, p7:int, p8:int)});

grunt> dump dataset;
我想将其转换为包和元组,如下所示:

10,4,21,9,50,9,4,50    
50,78,47,7,4,7,4,50    
68,25,43,13,11,68,10,9 
({(10),(4),(21),(9),(50)},{(9),(4),(50)})    
({(50),(78),(45),(7),(4)},{(7),(4),(50)})    
({(68),(25),(43),(13),(11)},{(68),(10),(9)})   
grunt> dataset = load '/user/dataset' Using PigStorage(',') As (bag1:bag{t1:tuple(p1:int, p2:int, p3:int, p4:int, p5:int)}, bag2:bag{t2:tuple(p6:int, p7:int, p8:int)});

grunt> dump dataset;
我尝试了下面的命令,但它没有显示任何数据。

10,4,21,9,50,9,4,50    
50,78,47,7,4,7,4,50    
68,25,43,13,11,68,10,9 
({(10),(4),(21),(9),(50)},{(9),(4),(50)})    
({(50),(78),(45),(7),(4)},{(7),(4),(50)})    
({(68),(25),(43),(13),(11)},{(68),(10),(9)})   
grunt> dataset = load '/user/dataset' Using PigStorage(',') As (bag1:bag{t1:tuple(p1:int, p2:int, p3:int, p4:int, p5:int)}, bag2:bag{t2:tuple(p6:int, p7:int, p8:int)});

grunt> dump dataset;
下面是dump的输出:

2015-09-11 05:26:31,057 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 8 time(s).    
2015-09-11 05:26:31,057 [main] INFO      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2015-09-11 05:26:31,058 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-09-11 05:26:31,058 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2015-09-11 05:26:31,063 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-09-11 05:26:31,063 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1    
(,)    
(,)    
(,)    
(,)    
请帮忙。如何将数据集转换为包和元组

找到了解决办法

我使用了以下命令:

grunt> dataset = load '/user/dataset' Using PigStorage(',') As (p1:int, p2:int, p3:int, p4:int, p5:int, p6:int, p7:int, p8:int);

grunt> dataset2 = Foreach dataset Generate TOBAG(p1, p2, p3, p4, p5) as bag1, TOBAG(p6, p7, p8) as bag2;

grunt> dump dataset2;