XML架构到配置单元架构
我正在尝试将xml文件加载到配置单元表中。我正在使用XMLSerde。我能够加载简单的平面xml文件。但是,当xml中有嵌套元素时,我会使用配置单元复杂数据类型来存储它们(例如,XML架构到配置单元架构,xml,hadoop,hive,hive-serde,Xml,Hadoop,Hive,Hive Serde,我正在尝试将xml文件加载到配置单元表中。我正在使用XMLSerde。我能够加载简单的平面xml文件。但是,当xml中有嵌套元素时,我会使用配置单元复杂数据类型来存储它们(例如,array)。下面是我尝试加载的示例xml。我的目标是将所有元素、属性和内容加载到配置单元表中 <description action="up"> <name action="aorup" ln="te"> this is name1 </name> &
array
)。下面是我尝试加载的示例xml。我的目标是将所有元素、属性和内容加载到配置单元表中
<description action="up">
<name action="aorup" ln="te">
this is name1
</name>
<name action="aorup" ln="tm">
this is name2
</name>
<name action="aorup" ln="hi">
this is name2
</name>
</description>
我想将整个xml加载到一个配置单元列中。我尝试了以下方法:
CREATE TABLE description(
description STRUCT<
Action:STRING,
name:ARRAY<STRUCT<
Action:STRING, ln:STRING, content:STRING
>>
>)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"xml.processor.class"="com.ximpleware.hive.serde2.xml.vtd.XmlProcessor",
"column.xpath.description"="/description")
STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES ("xmlinput.start"="<description ","xmlinput.end"= "</description>");
创建表描述(
描述结构<
动作:字符串,
名称:数组>
>)
行格式SERDE'com.ibm.spss.hive.serde2.xml.XmlSerDe'
具有serdeproperty(
“xml.processor.class”=“com.ximpleware.hive.serde2.xml.vtd.XmlProcessor”,
“column.xpath.description”=“/description”)
存储为INPUTFORMAT'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLProperty(“xmlinput.start”=”
非常感谢您的回答。您能解释一下create table
语句吗?这让我很困惑。我在这个问题的另一个嵌套xml模式上尝试了您的解决方案。但无法获得解决方案。您能解释一下我的错误吗?
CREATE TABLE description(
description STRUCT<
Action:STRING,
name:ARRAY<STRUCT<
Action:STRING, ln:STRING, content:STRING
>>
>)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"xml.processor.class"="com.ximpleware.hive.serde2.xml.vtd.XmlProcessor",
"column.xpath.description"="/description")
STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES ("xmlinput.start"="<description ","xmlinput.end"= "</description>");
create external table description
(
description struct<action:string,description:array<struct<action:string,ln:string,name:string>>>
)
row format serde 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
with serdeproperties
(
"column.xpath.description" = "/description"
)
stored as
inputformat 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
tblproperties
(
"xmlinput.start" = "<description "
,"xmlinput.end" = "</description>"
)
;
select * from description
;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| description |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {"action":"up","description":[{"action":"aorup","ln":"te","name":"this is name1"},{"action":"aorup","ln":"tm","name":"this is name2"},{"action":"aorup","ln":"hi","name":"this is name2"}]} |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+