Hadoop 星火上的蜂巢。读取拼花文件
我正在试着把拼花文件读入Spark上的蜂巢 所以我发现我应该做点什么:Hadoop 星火上的蜂巢。读取拼花文件,hadoop,hive,avro,parquet,spark-avro,Hadoop,Hive,Avro,Parquet,Spark Avro,我正在试着把拼花文件读入Spark上的蜂巢 所以我发现我应该做点什么: CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS AVRO TBLPROPERTIES ('avro.schema.url'='/files/events/avro_events_scheme.avsc'); CREATE EXTERNAL TABLE parquet_test
CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED
AS AVRO TBLPROPERTIES ('avro.schema.url'='/files/events/avro_events_scheme.avsc');
CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION '/files/events/parquet_events/';
我的avro计划是:
{
"type" : "parquet_file",
"namespace" : "events",
"name" : "events",
"fields" : [
{ "name" : "category" , "type" : "string" },
{ "name" : "duration" , "type" : "long" },
{ "name" : "name" , "type" : "string" },
{ "name" : "user_id" , "type" : "string"},
{ "name" : "value" , "type" : "long" }
]
}
因此,我收到一个错误:
org.apache.spark.sql.catalyst.parser.ParseException:
Operation not allowed: ROW FORMAT SERDE is incompatible with format 'avro',
which also specifies a serde(line 1, pos 0)
非常感谢。这有帮助,但现在有一行:
创建外部表parquet_测试,比如存储为parquet LOCATION'/dir_to_file/file_name.parq/'的avro_测试代码>它返回错误:SQL错误:org.apache.spark.SQL.catalyst.parser.ParseException:输入不匹配“如”应为{,'('、'SELECT'、'FROM'、'AS'、…
您也可以帮忙吗?您可以参考此链接..设置拼花地板位置'/dir_to_file',请排除file_name.parq。实际上,我的目录中不需要*.parq文件,但它返回相同的错误,说明蜂巢在 类似于
。查询是否可能因为前面步骤中的错误而返回错误?
I think we have to add inputforamt and outputformat classes.
CREATE TABLE parquet_test
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.url''avro.schema.url'='/hadoop/avro_events_scheme.avsc');
I hope above would work.