Apache pig 在Pig中将具有多个字段的元组拆分为具有单个字段的元组

Apache pig 在Pig中将具有多个字段的元组拆分为具有单个字段的元组,apache-pig,Apache Pig,我有不同长度的元组。我试图将它们转换为只有一个字段的元组(每个字段都是一个映射)。 原始数据: dump entryArray; ([symbol#HIG,security_type#EQUITY,foreign_entry_id#743094]) ([symbol#PEW,security_type#EQUITY,foreign_entry_id#743084]) ([symbol#AFFY,security_type#EQUITY,foreign_entry_id#5585],[symbol

我有不同长度的元组。我试图将它们转换为只有一个字段的元组(每个字段都是一个映射)。
原始数据:

dump entryArray;
([symbol#HIG,security_type#EQUITY,foreign_entry_id#743094])
([symbol#PEW,security_type#EQUITY,foreign_entry_id#743084])
([symbol#AFFY,security_type#EQUITY,foreign_entry_id#5585],[symbol#RFG,security_type#ETF,foreign_entry_id#5586],[symbol#SCHW,security_type#EQUITY,foreign_entry_id#5587],[symbol#VWO,security_type#ETF,foreign_entry_id#5588])
我希望输出是(每个字段仍然是地图):


我已经尝试过:
entry=FOREACH entryArray生成展平(TOBAG())输出具有相同的格式,但字段似乎不再是映射:

entry = FOREACH entryArray GENERATE FLATTEN(TOBAG());
dump entry;
([symbol#HIG,security_type#EQUITY,foreign_entry_id#743094])
([symbol#PEW,security_type#EQUITY,foreign_entry_id#743084])
([symbol#AFFY,security_type#EQUITY,foreign_entry_id#5585])   
([symbol#RFG,security_type#ETF,foreign_entry_id#5586])
([symbol#SCHW,security_type#EQUITY,foreign_entry_id#5587])
([symbol#VWO,security_type#ETF,foreign_entry_id#5588])

security_type = FOREACH entry GENERATE FLATTEN($0#'security_type');
it throws:
ERROR 1052: Cannot cast bytearray to map with schema :map
org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1059: <line 18, column 16> Problem while reconciling output schema of ForEach
at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.throwTypeCheckerException(TypeCheckingRelVisitor.java:141)
at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:181)
at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:75)
......
entry=FOREACH entryArray生成展平(TOBAG());
倾倒入口;
([symbol#HIG,证券类型#股权,外国#入境#id#743094])
([symbol#皮尤,证券类型#股权,外国#入境#id#743084])
([symbol#AFFY,证券类型#股权,外国#入境#id#5585])
([符号#RFG、证券#类型#ETF、外国#输入#id#5586])
([symbol#SCHW,证券#类型#股权,外国#入境#id#5587])
([symbol#VWO、证券#类型#ETF、外国#入境#id#5588])
security_type=FOREACH条目生成展平($0#“security_type”);
它抛出:
错误1052:无法将bytearray强制转换为架构为map的映射:map
org.apache.pig.impl.logicalLayer.validators.TypeCheckerException:错误1059:协调ForEach的输出架构时出现问题
位于org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.throwTypeCheckerException(TypeCheckingRelVisitor.java:141)
位于org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:181)
位于org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:75)
......

如有任何建议,将不胜感激。谢谢

您能为“原始数据”提供模式吗?我不确定,但我认为您必须用“,”拆分元组。请参阅链接:原始数据的模式太复杂,无法提供。。。我在这里简化了它,因为每个字段都是一个映射,我不能使用STRSPLTHow你在创建映射的元组吗?如果您没有设置元组的长度,那么您应该使用一个包来代替(特别是因为您需要将它展平)。
entry = FOREACH entryArray GENERATE FLATTEN(TOBAG());
dump entry;
([symbol#HIG,security_type#EQUITY,foreign_entry_id#743094])
([symbol#PEW,security_type#EQUITY,foreign_entry_id#743084])
([symbol#AFFY,security_type#EQUITY,foreign_entry_id#5585])   
([symbol#RFG,security_type#ETF,foreign_entry_id#5586])
([symbol#SCHW,security_type#EQUITY,foreign_entry_id#5587])
([symbol#VWO,security_type#ETF,foreign_entry_id#5588])

security_type = FOREACH entry GENERATE FLATTEN($0#'security_type');
it throws:
ERROR 1052: Cannot cast bytearray to map with schema :map
org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1059: <line 18, column 16> Problem while reconciling output schema of ForEach
at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.throwTypeCheckerException(TypeCheckingRelVisitor.java:141)
at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:181)
at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:75)
......