Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java Hive/ElasticMapreduce:如何让JsonSerDe忽略格式错误的JSON?_Java_Json_Hadoop_Hive_Elastic Map Reduce - Fatal编程技术网

Java Hive/ElasticMapreduce:如何让JsonSerDe忽略格式错误的JSON?

Java Hive/ElasticMapreduce:如何让JsonSerDe忽略格式错误的JSON?,java,json,hadoop,hive,elastic-map-reduce,Java,Json,Hadoop,Hive,Elastic Map Reduce,我对Hive和ElasticMapreduce还不太熟悉,目前我遇到了一个特殊的问题。 在包含数十亿行JSON对象的表上运行配置单元语句时,只要其中只有一行JSON无效/格式不正确,MapReduce作业就会崩溃 例外情况: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"ip":"3948813

我对Hive和ElasticMapreduce还不太熟悉,目前我遇到了一个特殊的问题。 在包含数十亿行JSON对象的表上运行配置单元语句时,只要其中只有一行JSON无效/格式不正确,MapReduce作业就会崩溃

例外情况:

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"ip":"39488130","cdate":"2012-08-09","cdate_ts":"2012-08-09 17:06:41","country":"SA","city":"Riyadh","mid":"6666276582211270592","osversion":"5.1.
1
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:441)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"ip":"39488130","cdate":"2012-08-09","cdate_ts":"2012-08-09 17:06:41","country":"SA","city":"Riyadh","mid":"6666276582211270592","osversion":"5.1.1
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
... 8 more
Caused by: com.google.gson.JsonSyntaxException: com.google.gson.stream.MalformedJsonException: Unterminated string near
at com.google.gson.Streams.parse(Streams.java:51)
at com.google.gson.JsonParser.parse(JsonParser.java:83)
at com.google.gson.JsonParser.parse(JsonParser.java:58)
at com.google.gson.JsonParser.parse(JsonParser.java:44)
at com.amazon.elasticmapreduce.JsonSerde.deserialize(Unknown Source)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:510)
... 9 more
Caused by: com.google.gson.stream.MalformedJsonException: Unterminated string near
at com.google.gson.stream.JsonReader.syntaxError(JsonReader.java:1110)
at com.google.gson.stream.JsonReader.nextString(JsonReader.java:967)
at com.google.gson.stream.JsonReader.nextValue(JsonReader.java:802)
at com.google.gson.stream.JsonReader.objectValue(JsonReader.java:782)
at com.google.gson.stream.JsonReader.quickPeek(JsonReader.java:377)
at com.google.gson.stream.JsonReader.peek(JsonReader.java:340)
at com.google.gson.Streams.parseRecursive(Streams.java:60)
at com.google.gson.Streams.parseRecursive(Streams.java:83)
at com.google.gson.Streams.parse(Streams.java:40)
... 14 more
CREATE EXTERNAL TABLE IF NOT EXISTS table1 (
column1 string,
column2 string
)
PARTITIONED BY (year string, month string)
ROW FORMAT SERDE 'com.amazon.elasticmapreduce.JsonSerde'
WITH SERDEPROPERTIES ('paths'='c1, c2')
LOCATION 's3://mybucket/table1';
我这样创建表:

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"ip":"39488130","cdate":"2012-08-09","cdate_ts":"2012-08-09 17:06:41","country":"SA","city":"Riyadh","mid":"6666276582211270592","osversion":"5.1.
1
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:441)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"ip":"39488130","cdate":"2012-08-09","cdate_ts":"2012-08-09 17:06:41","country":"SA","city":"Riyadh","mid":"6666276582211270592","osversion":"5.1.1
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
... 8 more
Caused by: com.google.gson.JsonSyntaxException: com.google.gson.stream.MalformedJsonException: Unterminated string near
at com.google.gson.Streams.parse(Streams.java:51)
at com.google.gson.JsonParser.parse(JsonParser.java:83)
at com.google.gson.JsonParser.parse(JsonParser.java:58)
at com.google.gson.JsonParser.parse(JsonParser.java:44)
at com.amazon.elasticmapreduce.JsonSerde.deserialize(Unknown Source)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:510)
... 9 more
Caused by: com.google.gson.stream.MalformedJsonException: Unterminated string near
at com.google.gson.stream.JsonReader.syntaxError(JsonReader.java:1110)
at com.google.gson.stream.JsonReader.nextString(JsonReader.java:967)
at com.google.gson.stream.JsonReader.nextValue(JsonReader.java:802)
at com.google.gson.stream.JsonReader.objectValue(JsonReader.java:782)
at com.google.gson.stream.JsonReader.quickPeek(JsonReader.java:377)
at com.google.gson.stream.JsonReader.peek(JsonReader.java:340)
at com.google.gson.Streams.parseRecursive(Streams.java:60)
at com.google.gson.Streams.parseRecursive(Streams.java:83)
at com.google.gson.Streams.parse(Streams.java:40)
... 14 more
CREATE EXTERNAL TABLE IF NOT EXISTS table1 (
column1 string,
column2 string
)
PARTITIONED BY (year string, month string)
ROW FORMAT SERDE 'com.amazon.elasticmapreduce.JsonSerde'
WITH SERDEPROPERTIES ('paths'='c1, c2')
LOCATION 's3://mybucket/table1';
我能做些什么来防止撞车?忽略格式错误的JSON对象/字符串是可以的,因为它是格式错误的数十亿个对象中的一个

提前谢谢你的帮助。
最好的,Sascha基本上,发生上述错误是因为JSON字符串无效。 试着解决这个问题


另外,为了避免应用程序崩溃,请在try-catch块中捕获该异常并进一步处理。这样应用程序就不会崩溃。

来自Apache的JsonSerDe似乎忽略了格式错误的JSON字符串。。。

通过更改行格式中使用的类并添加“畸形”属性,您可以在畸形JSON上创建表:

ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ("ignore.malformed.json" = "true")
LOCATION ...

使用“hive site.xml”中的“hive.aux.jars.path”属性或“ADD JAR”配置单元指令包含JAR。您可以找到JAR,也可以从中编译它。

该项目似乎不支持高于0.10的配置单元版本: