Hadoop 包及；Pig中的元组模式_Hadoop_Tuples_Schema_Apache Pig_Cloudera

Hadoop 包及；Pig中的元组模式

hadoop apache-pig

Hadoop 包及；Pig中的元组模式,hadoop,tuples,schema,apache-pig,cloudera,Hadoop,Tuples,Schema,Apache Pig,Cloudera,我试图使用JsonLoader为我试图加载的一些数据指定模式，我想要上传的数据的格式是 Features:["Speedy","New","Automatic",..] 对于每个记录，功能的数量不是固定的，可以是不同的。我在模式中表示为： Features: bag{a: tuple(t:chararray)} 但是它不起作用。有人能帮我找出正确的语法并指出我错的地方吗字段名规范是不必要的，因为您有没有任何字段名的简单数组。试试这个： a = load 'a.json' using Jso

我试图使用JsonLoader为我试图加载的一些数据指定模式，我想要上传的数据的格式是

Features:["Speedy","New","Automatic",..]

对于每个记录，功能的数量不是固定的，可以是不同的。我在模式中表示为：

Features: bag{a: tuple(t:chararray)}

但是它不起作用。有人能帮我找出正确的语法并指出我错的地方吗

字段名规范是不必要的，因为您有没有任何字段名的简单数组。试试这个：

a = load 'a.json' using JsonLoader('value:int,feature:{(chararray)}');

Json文件：

{"value":1, "feature":[1, 2, 3] }
{"value":2, "feature":[2,3,4]}
{"value":3, "feature":[12,13,14]}
{"value":4, "feature":[2]}

输出：

(1,{(1),(2),(3)})
(2,{(2),(3),(4)})
(3,{(12),(13),(14)})
(4,{(2)})

字段名规范是不必要的，因为您有没有任何字段名的简单数组。试试这个：

a = load 'a.json' using JsonLoader('value:int,feature:{(chararray)}');

Json文件：

{"value":1, "feature":[1, 2, 3] }
{"value":2, "feature":[2,3,4]}
{"value":3, "feature":[12,13,14]}
{"value":4, "feature":[2]}

输出：

(1,{(1),(2),(3)})
(2,{(2),(3),(4)})
(3,{(12),(13),(14)})
(4,{(2)})