Apache pig 使用字符串数组将元组拆分为多个元组

Apache pig 使用字符串数组将元组拆分为多个元组,apache-pig,Apache Pig,我有一组元组,比如 A = (1, ["Football","Baseball"]) (2, ["Swimming","Baseball"]) 我想根据字符串数组拆分元组,以便最终结果如下 (1, "Football") (1, "Baseball") (2, "Swimming") (2, "Baseball") 如何在pig中执行此操作?首先使用REPLACE功能从输入中删除'['和']字符,然后将输出包装到包中并将其展平 输入 1,["Football","Baseball"] 2,

我有一组元组,比如

A = 
(1, ["Football","Baseball"])
(2, ["Swimming","Baseball"])
我想根据字符串数组拆分元组,以便最终结果如下

(1, "Football")
(1, "Baseball")
(2, "Swimming")
(2, "Baseball")

如何在pig中执行此操作?

首先使用
REPLACE
功能从输入中删除
'['
']
字符,然后将输出包装到
包中
并将其展平

输入

1,["Football","Baseball"]
2,["Swimming","Baseball"]
PigScript:

A = LOAD 'input' USING PigStorage(',') AS (f1:int,f2:chararray,f3:chararray);
B = FOREACH A GENERATE f1,FLATTEN(TOBAG(REPLACE(f2,'[\\[\\]]',''),REPLACE(f3,'[\\[\\]]','')));
DUMP B;
(1,"Football")
(1,"Baseball")
(2,"Swimming")
(2,"Baseball")
输出:

A = LOAD 'input' USING PigStorage(',') AS (f1:int,f2:chararray,f3:chararray);
B = FOREACH A GENERATE f1,FLATTEN(TOBAG(REPLACE(f2,'[\\[\\]]',''),REPLACE(f3,'[\\[\\]]','')));
DUMP B;
(1,"Football")
(1,"Baseball")
(2,"Swimming")
(2,"Baseball")

@西瓦萨吉。实际上,没有3列,有2列,第一列是id,第二列是具有给定id的人喜欢的游戏数组。它不是字符串而是字符串数组。@user12331,Pig不支持数组类型的数据类型,上述输入将被视为简单字符串而不是字符串数组(Pig没有任何智能知道它是字符串数组)。当您应用“,”作为分隔符时,pig将把您的输入分隔为(col1=1,col2=[“Football”和col3=“ballball”])。