Apache pig 在袋子上压扁,不按预期工作

Apache pig 在袋子上压扁,不按预期工作,apache-pig,flatten,bag,Apache Pig,Flatten,Bag,输入:包含地图数据的.csv文件 [banks#{(bofa),(chase)}] 清管器脚本: A = LOAD 'a.csv' AS (bank_details:map[]); B = FOREACH A GENERATE FLATTEN(bank_details#'banks') AS bank_name; ({(bofa),(chase)}) org.apache.pig.backend.executionengine.ExecException: ERROR 0: Excepti

输入:包含地图数据的.csv文件

[banks#{(bofa),(chase)}]
清管器脚本:

A = LOAD 'a.csv' AS (bank_details:map[]);
B = FOREACH A GENERATE FLATTEN(bank_details#'banks') AS bank_name;
({(bofa),(chase)})
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [POProject (Name: Project[bag][0] - scope-114 Operator Key: scope-114) children: null at []]: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to org.apache.pig.data.DataBag
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:366)
(bofa)
(chase)
A = LOAD 'a.csv' AS (bank_details:bag{t:(bank_name:chararray)});
B = FOREACH A GENERATE FLATTEN(bank_details) AS bank_name;
输出:B:

A = LOAD 'a.csv' AS (bank_details:map[]);
B = FOREACH A GENERATE FLATTEN(bank_details#'banks') AS bank_name;
({(bofa),(chase)})
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [POProject (Name: Project[bag][0] - scope-114 Operator Key: scope-114) children: null at []]: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to org.apache.pig.data.DataBag
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:366)
(bofa)
(chase)
A = LOAD 'a.csv' AS (bank_details:bag{t:(bank_name:chararray)});
B = FOREACH A GENERATE FLATTEN(bank_details) AS bank_name;
布袋压扁

C = FOREACH A GENERATE bank_details#'banks' AS banks: bag{t:(bank:chararray)};
D = FOREACH C GENERATE FLATTEN(banks);
输出:D:

A = LOAD 'a.csv' AS (bank_details:map[]);
B = FOREACH A GENERATE FLATTEN(bank_details#'banks') AS bank_name;
({(bofa),(chase)})
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [POProject (Name: Project[bag][0] - scope-114 Operator Key: scope-114) children: null at []]: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to org.apache.pig.data.DataBag
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:366)
(bofa)
(chase)
A = LOAD 'a.csv' AS (bank_details:bag{t:(bank_name:chararray)});
B = FOREACH A GENERATE FLATTEN(bank_details) AS bank_name;
预期输出:

A = LOAD 'a.csv' AS (bank_details:map[]);
B = FOREACH A GENERATE FLATTEN(bank_details#'banks') AS bank_name;
({(bofa),(chase)})
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [POProject (Name: Project[bag][0] - scope-114 Operator Key: scope-114) children: null at []]: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to org.apache.pig.data.DataBag
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:366)
(bofa)
(chase)
A = LOAD 'a.csv' AS (bank_details:bag{t:(bank_name:chararray)});
B = FOREACH A GENERATE FLATTEN(bank_details) AS bank_name;
如果输入文件有一个包,如下所示:

输入:a.csv

{(bofa),(chase)}
清管器脚本:

A = LOAD 'a.csv' AS (bank_details:map[]);
B = FOREACH A GENERATE FLATTEN(bank_details#'banks') AS bank_name;
({(bofa),(chase)})
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [POProject (Name: Project[bag][0] - scope-114 Operator Key: scope-114) children: null at []]: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to org.apache.pig.data.DataBag
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:366)
(bofa)
(chase)
A = LOAD 'a.csv' AS (bank_details:bag{t:(bank_name:chararray)});
B = FOREACH A GENERATE FLATTEN(bank_details) AS bank_name;
输出:B:生成展平结果

(bofa)
(chase)

关于为什么我们不能在别名C和D中展平行李的任何输入。

这里的问题是,当您没有为
地图
指定模式时,它默认为
bytearray
,如您所见:

因此,当您尝试将其强制转换为
数据包
时,将导致
ClassCastException
,因为
DataByteArray
无法强制转换为
DataBag
。如果您在
C
上执行
dump
,它仍然可以工作,因为您没有对数据执行任何实际操作,只是对数据进行投影。但是,一旦调用
展平
函数,它将收到一个
数据包
,并且在尝试将
字节数组
强制转换到它时失败

它在第二种情况下工作的原因是您正确地指示了映射的模式,这是一个
,因此它不会获得默认值,即
bytearray

A = LOAD 'a.csv' AS (bank_details:bag{t:(bank_name:chararray)});
编辑

对不起,我没有看到在第二种情况下,您没有使用
地图
,而是直接使用
。如果要使用
映射
,只要指明模式即可避免上述情况:

A = LOAD 'a.csv' AS (bank_details:map[{(name:chararray)}]);
B = FOREACH A GENERATE FLATTEN(bank_details#'banks') AS bank_name;

dump B;
(bofa)
(chase)

同意你的想法,在我的用例中,我有一个地图,其中的值是不同的数据类型。它不是一个包含所有包值的地图。任何用于显式强制转换的输入,就像我尝试使用别名C:C=FOREACH A生成bank_details#'banks'作为banks:bag{t:(bank:chararray)}@穆拉利奥我不确定我是否理解你的意思。。。你有不同类型的地图,对吗?您无法从
bytearray
bag
进行显式转换,这两个类没有任何共同点,因此您会得到一个异常。解决此问题的唯一方法是为地图指定模式。如果数值可能不同,你不能把它们都放在一个袋子里吗?我的意思是,里面有包的地图:
a=LOAD'a.csv'AS(bank_details:map[bag{t:(name:chararray)})是的,我的意思是映射具有不同数据类型的值。例如,key1#2,key2#{(美国银行),(大通)}。我知道我们无法将字节数组转换为bag。将尝试将所有键值作为包。谢谢你的投入!