Hadoop 将XML数据加载到配置单元表时出错
我正在尝试将XML文件加载到我的配置单元表中。下面是我的配置单元表查询Hadoop 将XML数据加载到配置单元表时出错,hadoop,hive,Hadoop,Hive,我正在尝试将XML文件加载到我的配置单元表中。下面是我的配置单元表查询 CREATE TABLE MYDATA(NAME STRING, AGE INT, SEX STRING) ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe' WITH SERDEPROPERTIES( "column.xpath.NAME"="/TAG/NAME/text()", "column.xpath.AGE"="/TAG/AGE/
CREATE TABLE MYDATA(NAME STRING, AGE INT, SEX STRING)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES(
"column.xpath.NAME"="/TAG/NAME/text()",
"column.xpath.AGE"="/TAG/AGE/int()",
"column.xpath.SEX"="/TAG/SEX/text()")
STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/home/sid/hivexmltab'
TBLPROPERTIES("xmlinput.start"="<TAG","xmlinput.end"="</TAG>");
但我得到的结果如下:
ABCD,25,male
EFGH,23,female
<string>ABCDEFGH</string> NULL <string>malefemale</string>
ABCDEFGH空男性女性
Im使用jar文件:hivexmlserde-1.0.5.3.jar for XMLSerde
有人能告诉我我在这里犯了什么错误吗?
感谢您的帮助。在任何地方使用text(),将年龄部分修改为:
"column.xpath.AGE"="/TAG/AGE/text()"
您可以稍后在配置单元表中更改数据类型
从创建表中删除位置零件:
LOCATION '/home/sid/hivexmltab'
而是在创建表后使用LOAD命令加载所有数据
load data local inpath '/home/sid/hivexmltab/XMLfile.xml' overwrite into table MYDATA;
这是一个糟糕的XML结构…
…..
的任何组合都应使用附加标记进行包装
它起作用了。在构建用于加载xml文件的适当表结构方面,确实有很大帮助。
load data local inpath '/home/sid/hivexmltab/XMLfile.xml' overwrite into table MYDATA;
CREATE EXTERNAL TABLE MYDATA
(
NAME array<string>
,AGE array<int>
,SEX array<string>
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES
(
"column.xpath.NAME" = "TAG/NAME/text()"
,"column.xpath.AGE" = "TAG/AGE/text()"
,"column.xpath.SEX" = "TAG/SEX/text()"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/home/sid/hivexmltab'
TBLPROPERTIES
(
"xmlinput.start" = "<TAG"
,"xmlinput.end" = "</TAG>"
)
;
select * from MYDATA
;
+-----------------+------------+-------------------+
| a.name | mydata.age | mydata.sex |
+-----------------+------------+-------------------+
| ["ABCD","EFGH"] | [25,23] | ["male","female"] |
+-----------------+------------+-------------------+
select NAME[pe.n] as name
,AGE [pe.n] as age
,SEX [pe.n] as sex
from MYDATA m
lateral view posexplode (m.NAME) pe as n,x
;
+------+-----+--------+
| name | age | sex |
+------+-----+--------+
| ABCD | 25 | male |
| EFGH | 23 | female |
+------+-----+--------+