Hadoop 将XML数据加载到配置单元表时出错

Hadoop 将XML数据加载到配置单元表时出错,hadoop,hive,Hadoop,Hive,我正在尝试将XML文件加载到我的配置单元表中。下面是我的配置单元表查询 CREATE TABLE MYDATA(NAME STRING, AGE INT, SEX STRING) ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe' WITH SERDEPROPERTIES( "column.xpath.NAME"="/TAG/NAME/text()", "column.xpath.AGE"="/TAG/AGE/

我正在尝试将XML文件加载到我的配置单元表中。下面是我的配置单元表查询

CREATE TABLE MYDATA(NAME STRING, AGE INT, SEX STRING)
   ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
   WITH SERDEPROPERTIES(
   "column.xpath.NAME"="/TAG/NAME/text()",
   "column.xpath.AGE"="/TAG/AGE/int()",
   "column.xpath.SEX"="/TAG/SEX/text()")
   STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
   OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
   LOCATION '/home/sid/hivexmltab'
   TBLPROPERTIES("xmlinput.start"="<TAG","xmlinput.end"="</TAG>");
但我得到的结果如下:

ABCD,25,male
EFGH,23,female
<string>ABCDEFGH</string>   NULL    <string>malefemale</string>
ABCDEFGH空男性女性
Im使用jar文件:hivexmlserde-1.0.5.3.jar for XMLSerde

有人能告诉我我在这里犯了什么错误吗? 感谢您的帮助。

在任何地方使用text(),将年龄部分修改为:

   "column.xpath.AGE"="/TAG/AGE/text()"
您可以稍后在配置单元表中更改数据类型

从创建表中删除位置零件:

LOCATION '/home/sid/hivexmltab'
而是在创建表后使用LOAD命令加载所有数据

load data local inpath '/home/sid/hivexmltab/XMLfile.xml' overwrite into table MYDATA;

这是一个糟糕的XML结构…
…..
的任何组合都应使用附加标记进行包装






它起作用了。在构建用于加载xml文件的适当表结构方面,确实有很大帮助。
load data local inpath '/home/sid/hivexmltab/XMLfile.xml' overwrite into table MYDATA;
CREATE EXTERNAL TABLE MYDATA
(
    NAME    array<string>
   ,AGE     array<int>
   ,SEX     array<string>    
)
    ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
    WITH SERDEPROPERTIES
    (
        "column.xpath.NAME" = "TAG/NAME/text()"
       ,"column.xpath.AGE"  = "TAG/AGE/text()"
       ,"column.xpath.SEX"  = "TAG/SEX/text()"
    )
    STORED AS 
    INPUTFORMAT     'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
    OUTPUTFORMAT    'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
    LOCATION        '/home/sid/hivexmltab'
    TBLPROPERTIES
    (
        "xmlinput.start" = "<TAG"
       ,"xmlinput.end"   = "</TAG>"
    )
;
select * from MYDATA
;
+-----------------+------------+-------------------+
|     a.name      | mydata.age |    mydata.sex     |
+-----------------+------------+-------------------+
| ["ABCD","EFGH"] | [25,23]    | ["male","female"] |
+-----------------+------------+-------------------+
select  NAME[pe.n]  as name
       ,AGE [pe.n]  as age
       ,SEX [pe.n]  as sex

from    MYDATA m
        lateral view posexplode (m.NAME) pe as n,x
;
+------+-----+--------+
| name | age |  sex   |
+------+-----+--------+
| ABCD |  25 | male   |
| EFGH |  23 | female |
+------+-----+--------+