Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 写入avro格式时,在pyspark的HiveContext中查询失败_Apache Spark_Hive_Pyspark_Avro_Hivecontext - Fatal编程技术网

Apache spark 写入avro格式时,在pyspark的HiveContext中查询失败

Apache spark 写入avro格式时,在pyspark的HiveContext中查询失败,apache-spark,hive,pyspark,avro,hivecontext,Apache Spark,Hive,Pyspark,Avro,Hivecontext,我正在尝试使用pyspark的HiveContext以avro格式加载外部表。 外部表创建查询在配置单元中运行。但是,同一查询在配置单元上下文中失败,错误为,org.apache.hadoop.hive.serde2.SerDeException:遇到异常确定模式。返回信号模式以指示问题:null 我的avro模式如下 { "type" : "record", "name" : "test_table", "namespace" : "com.ent.dl.enh.test_tabl

我正在尝试使用pyspark的HiveContext以avro格式加载外部表。 外部表创建查询在配置单元中运行。但是,同一查询在配置单元上下文中失败,错误为,
org.apache.hadoop.hive.serde2.SerDeException:遇到异常确定模式。返回信号模式以指示问题:null

我的avro模式如下

{
  "type" : "record",
  "name" : "test_table",
  "namespace" : "com.ent.dl.enh.test_table",
  "fields" : [ {
    "name" : "column1",
    "type" : [ "null", "string" ] , "default": null
  }, {
    "name" : "column2",
    "type" : [ "null", "string" ] , "default": null
  }, {
    "name" : "column3",
    "type" : [ "null", "string" ] , "default": null
  }, {
    "name" : "column4",
    "type" : [ "null", "string" ] , "default": null
  } ]
}
我的创建表脚本是

CREATE EXTERNAL TABLE test_table_enh ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 's3://Staging/test_table/enh' TBLPROPERTIES ('avro.schema.url'='s3://Staging/test_table/test_table.avsc')
我使用spark submit运行下面的代码

from pyspark import SparkConf, SparkContext
from pyspark.sql import HiveContext

print "Start of program"
sc = SparkContext()
hive_context = HiveContext(sc)


hive_context.sql("CREATE EXTERNAL TABLE test_table_enh ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 's3://Staging/test_table/enh' TBLPROPERTIES ('avro.schema.url'='s3://Staging/test_table/test_table.avsc')")

print "end"
Spark版本:2.2.0 OpenJDK版本:1.8.0 配置单元版本:2.3.0