Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/cmake/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
将JSON数组加载到Pig中_Json_Hadoop_Apache Pig_Hdfs_Bigdata - Fatal编程技术网

将JSON数组加载到Pig中

将JSON数组加载到Pig中,json,hadoop,apache-pig,hdfs,bigdata,Json,Hadoop,Apache Pig,Hdfs,Bigdata,我有一个json文件,格式如下 [ { "id": 2, "createdBy": 0, "status": 0, "utcTime": "Oct 14, 2014 4:49:47 PM", "placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia", "longitude": 77.5983817, "latitude": 12

我有一个json文件,格式如下

[
  {
    "id": 2,
    "createdBy": 0,
    "status": 0,
    "utcTime": "Oct 14, 2014 4:49:47 PM",
    "placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia",
    "longitude": 77.5983817,
    "latitude": 12.9832418,
    "createdDate": "Sep 16, 2014 2:59:03 PM",
    "accuracy": 5,
    "loginType": 1,
    "mobileNo": "0000005567"
  },
  {
    "id": 4,
    "createdBy": 0,
    "status": 0,
    "utcTime": "Oct 14, 2014 4:52:48 PM",
    "placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia",
    "longitude": 77.5983817,
    "latitude": 12.9832418,
    "createdDate": "Oct 8, 2014 5:24:42 PM",
    "accuracy": 5,
    "loginType": 1,
    "mobileNo": "0000005566"
  }
]
当我尝试使用JsonLoader类将数据加载到pig时,我得到了一个错误,比如输入的意外结束:对象的预期关闭标记

a = LOAD '/user/root/jsoneg/exp.json' USING JsonLoader('id:int,createdBy:int,status:int,utcTime:chararray,placeName:chararray,longitude:double,latitude:double,createdDate:chararray,accuracy:double,loginType:double,mobileNo:chararray');
b = foreach a generate $0,$1,$2;
dump b;

我以前也遇到过类似的问题,后来我知道Pig JSON不支持多行JSON格式。它总是希望json输入必须是单行的

我建议您使用elephantbird json加载程序,而不是本机Jsonloader。这对于Jsons格式非常好

您可以从下面的链接下载JAR

http://www.java2s.com/Code/Jar/e/elephant.htm
我将您的输入格式更改为单线,并通过elephantbird加载,如下所示

input.json
{"test":[{"id": 2,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:49:47 PM","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Sep 16, 2014 2:59:03 PM","accuracy": 5,"loginType": 1,"mobileNo": "0000005567"},{"id": 4,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:52:48 PM","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Oct 8, 2014 5:24:42 PM","accuracy": 5,"loginType": 1,"mobileNo": "0000005566"}]}

PigScript:
REGISTER '/tmp/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/tmp/elephant-bird-pig-4.1.jar';

A = LOAD 'input.json ' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
B = FOREACH A GENERATE FLATTEN($0#'test');
C = FOREACH B GENERATE FLATTEN($0) AS mymap;
D = FOREACH C GENERATE mymap#'id',mymap#'placeName',mymap#'status';
DUMP D;

Output:
(2,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0)
(4,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0)

我以前也遇到过类似的问题,后来我知道Pig JSON不支持多行JSON格式。它总是希望json输入必须是单行的

我建议您使用elephantbird json加载程序,而不是本机Jsonloader。这对于Jsons格式非常好

您可以从下面的链接下载JAR

http://www.java2s.com/Code/Jar/e/elephant.htm
我将您的输入格式更改为单线,并通过elephantbird加载,如下所示

input.json
{"test":[{"id": 2,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:49:47 PM","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Sep 16, 2014 2:59:03 PM","accuracy": 5,"loginType": 1,"mobileNo": "0000005567"},{"id": 4,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:52:48 PM","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Oct 8, 2014 5:24:42 PM","accuracy": 5,"loginType": 1,"mobileNo": "0000005566"}]}

PigScript:
REGISTER '/tmp/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/tmp/elephant-bird-pig-4.1.jar';

A = LOAD 'input.json ' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
B = FOREACH A GENERATE FLATTEN($0#'test');
C = FOREACH B GENERATE FLATTEN($0) AS mymap;
D = FOREACH C GENERATE mymap#'id',mymap#'placeName',mymap#'status';
DUMP D;

Output:
(2,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0)
(4,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0)

我以前也遇到过类似的问题,后来我知道Pig JSON不支持多行JSON格式。它总是希望json输入必须是单行的

我建议您使用elephantbird json加载程序,而不是本机Jsonloader。这对于Jsons格式非常好

您可以从下面的链接下载JAR

http://www.java2s.com/Code/Jar/e/elephant.htm
我将您的输入格式更改为单线,并通过elephantbird加载,如下所示

input.json
{"test":[{"id": 2,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:49:47 PM","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Sep 16, 2014 2:59:03 PM","accuracy": 5,"loginType": 1,"mobileNo": "0000005567"},{"id": 4,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:52:48 PM","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Oct 8, 2014 5:24:42 PM","accuracy": 5,"loginType": 1,"mobileNo": "0000005566"}]}

PigScript:
REGISTER '/tmp/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/tmp/elephant-bird-pig-4.1.jar';

A = LOAD 'input.json ' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
B = FOREACH A GENERATE FLATTEN($0#'test');
C = FOREACH B GENERATE FLATTEN($0) AS mymap;
D = FOREACH C GENERATE mymap#'id',mymap#'placeName',mymap#'status';
DUMP D;

Output:
(2,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0)
(4,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0)

我以前也遇到过类似的问题,后来我知道Pig JSON不支持多行JSON格式。它总是希望json输入必须是单行的

我建议您使用elephantbird json加载程序,而不是本机Jsonloader。这对于Jsons格式非常好

您可以从下面的链接下载JAR

http://www.java2s.com/Code/Jar/e/elephant.htm
我将您的输入格式更改为单线,并通过elephantbird加载,如下所示

input.json
{"test":[{"id": 2,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:49:47 PM","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Sep 16, 2014 2:59:03 PM","accuracy": 5,"loginType": 1,"mobileNo": "0000005567"},{"id": 4,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:52:48 PM","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Oct 8, 2014 5:24:42 PM","accuracy": 5,"loginType": 1,"mobileNo": "0000005566"}]}

PigScript:
REGISTER '/tmp/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/tmp/elephant-bird-pig-4.1.jar';

A = LOAD 'input.json ' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
B = FOREACH A GENERATE FLATTEN($0#'test');
C = FOREACH B GENERATE FLATTEN($0) AS mymap;
D = FOREACH C GENERATE mymap#'id',mymap#'placeName',mymap#'status';
DUMP D;

Output:
(2,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0)
(4,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0)

您的json文件是否格式化为在单行上有一个json对象?我认为pig试图逐行解释json,而您的json被拆分为多行。您的json文件是否格式化为在单行上有单个json对象?我认为pig试图逐行解释json,而您的json被拆分为多行。您的json文件是否格式化为在单行上有单个json对象?我认为pig试图逐行解释json,而您的json被拆分为多行。您的json文件是否格式化为在单行上有单个json对象?我认为pig试图逐行解释json,而您的json被拆分为多行。