Hadoop 如何在Pig中将多行合并到一行中
我需要使用Pig脚本将多个元组组合成一个元组。你能提供一些指导吗Hadoop 如何在Pig中将多行合并到一行中,hadoop,mapreduce,apache-pig,Hadoop,Mapreduce,Apache Pig,我需要使用Pig脚本将多个元组组合成一个元组。你能提供一些指导吗 dump requestFile; 电流输出 (Logging Transaction ID:21214,/var/log/tibco/,NESS-A-1-LPNameRequesttoNESS.log,tibcoTest log) (Default Data:LP Name Request Message Executed Successfully) (LoanPath Request ID: 88128640) (R
dump requestFile;
电流输出
(Logging Transaction ID:21214,/var/log/tibco/,NESS-A-1-LPNameRequesttoNESS.log,tibcoTest log)
(Default Data:LP Name Request Message Executed Successfully)
(LoanPath Request ID: 88128640)
(RequestGroupID#: )
(SplitCount#: 2 )
(SplitIndex: 1)
(Correlation ID : 88128640-1 )
(Logging Transaction ID:21214,/var/log/tibco/,NESS-A-1-LPNameRequesttoNESS.log,tibcoTest log,Default Data:LP Name Request Message Executed Successfully,LoanPath Request ID: 88128640,RequestGroupID#: ,SplitCount#: 2,SplitIndex: 1)
(Correlation ID : 88128640-1 )
所需输出
(Logging Transaction ID:21214,/var/log/tibco/,NESS-A-1-LPNameRequesttoNESS.log,tibcoTest log)
(Default Data:LP Name Request Message Executed Successfully)
(LoanPath Request ID: 88128640)
(RequestGroupID#: )
(SplitCount#: 2 )
(SplitIndex: 1)
(Correlation ID : 88128640-1 )
(Logging Transaction ID:21214,/var/log/tibco/,NESS-A-1-LPNameRequesttoNESS.log,tibcoTest log,Default Data:LP Name Request Message Executed Successfully,LoanPath Request ID: 88128640,RequestGroupID#: ,SplitCount#: 2,SplitIndex: 1)
(Correlation ID : 88128640-1 )
那么:
requestFile = foreach requestFile generate flatten(tuple);
G = GROUP requestFile ALL;
F = FOREACH G generate requestFile;
将一组行分组的标准是什么?该数据是pig中一组操作的结果吗?如果是这样的话,也许你应该在你的pig脚本中对它们进行分组。你可以尝试一个适合你需求的pig UDF(用Java编写)。