Sql 配置单元:无法从配置单元表中的文件插入数组和映射
这是我所拥有的表的模式Sql 配置单元:无法从配置单元表中的文件插入数组和映射,sql,hadoop,hive,hql,Sql,Hadoop,Hive,Hql,这是我所拥有的表的模式 CREATE DATABASE IF NOT EXISTS mydb; USE mydb; CREATE TABLE IF NOT EXISTS mytab ( idcol string, arrcol array<string>, mapcol map<string,string> ) PARTITIONED BY (data_date string) ROW FORMAT DELIMITED FIELDS TERMINATED BY
CREATE DATABASE IF NOT EXISTS mydb;
USE mydb;
CREATE TABLE IF NOT EXISTS mytab (
idcol string,
arrcol array<string>,
mapcol map<string,string>
)
PARTITIONED BY (data_date string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
看起来加载对数据进行了一些转换。它保留了字符串值,但是数组和映射存在一些问题
如何正确插入阵列和映射
我还尝试了以下内容作为输入
123 |数组(“a”、“b”){“1”:“a”、“2”:“b”}
加载成功,但当我查询数据时,我得到了
root@0d2b0044b4c1:/opt# hive -e "use mydb;select * from mytab;"
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
OK
Time taken: 6.096 seconds
OK
123 ["array(\"a\",\"b\")"] {"{\"1\":\"a\",\"2\":\"b\"}":null} 1554090675
Time taken: 3.266 seconds, Fetched: 1 row(s)
更新
非常感谢pedram bashiri的回答。我创建了外部表并能够填充它。但是,所有内容都以字符串形式填充
hive> drop table if exists extab;
OK
Time taken: 0.01 seconds
hive> create external table extab(idcol string,arrcol array<string>,mapcol map<string,string>, data_date string)
> row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
> with serdeproperties (
> "separatorChar" = "|",
> "quoteChar" = "\"",
> "escapeChar" = "\\"
> )
> stored as textfile
> location '/tmp/serdes/';
OK
Time taken: 0.078 seconds
hive> desc extab;
OK
idcol string from deserializer
arrcol string from deserializer
mapcol string from deserializer
data_date string from deserializer
Time taken: 0.059 seconds, Fetched: 4 row(s)
hive> select * from extab;
OK
123 ["a","b"] {"1":"a","2":"b"} 2019
Time taken: 0.153 seconds, Fetched: 1 row(s)
hive>
我也试过了
create external table extab(idcol string,arrcol array<string>,mapcol map<string,string>, data_date string)
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties (
"separatorChar" = "|"
)
stored as textfile
location '/tmp/serdes/';
创建外部表extab(idcol字符串、arrcol数组、mapcol映射、数据\日期字符串)
行格式serde'org.apache.hadoop.hive.serde2.OpenCSVSerde'
具有serdeproperty(
“separatorChar”=“|”
)
存储为文本文件
位置“/tmp/serdes/”;
但是,所有内容都存储为字符串,因此现在我在尝试插入时会发现类型不匹配。用于根据psv文件创建一个外部表,并将其称为mytab\u external。指定SerdeProperty,如
with serdeproperties (
"separatorChar" = "|",
"quoteChar" = """,
"escapeChar" = "\\"
)
然后简单地做
INSERT INTO mytab
SELECT * FROM mytab_external;
所以经过大量挖掘,我找到了答案
CREATE TABLE IF NOT EXISTS mytab (
idcol string,
arrcol array<string>,
mapcol map<string,string>
)
PARTITIONED BY (data_date string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY '='
STORED AS TEXTFILE;
这正是我需要的非常感谢这些建议。请看我的更新。外部表中的所有内容都以字符串形式填充,因此由于类型不匹配,无法插入到表中
INSERT INTO mytab
SELECT * FROM mytab_external;
CREATE TABLE IF NOT EXISTS mytab (
idcol string,
arrcol array<string>,
mapcol map<string,string>
)
PARTITIONED BY (data_date string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY '='
STORED AS TEXTFILE;
root@0d2b0044b4c1:/opt# hive -e "use mydb; LOAD DATA INPATH '/path/to/file' INTO TABLE mytab PARTITION (data_date='2019');"
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
OK
Time taken: 6.456 seconds
Loading data to table mydb.mytab partition (data_date=2019)
OK
Time taken: 1.912 seconds
root@0d2b0044b4c1:/opt# hive -e "use mydb; select * from mytab;"
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
OK
Time taken: 6.843 seconds
OK
123 ["a","b"] {"1":"a","2":"b"} 2019
root@0d2b0044b4c1:/opt#