将xml数据加载到配置单元表：org.apache.hadoop.hive.ql.metadata.HiveException_Hive_Xmldataset

将xml数据加载到配置单元表：org.apache.hadoop.hive.ql.metadata.HiveException

hive

将xml数据加载到配置单元表：org.apache.hadoop.hive.ql.metadata.HiveException,hive,xmldataset,Hive,Xmldataset,我试图将XML数据加载到配置单元中，但出现错误： java.lang.RuntimeException:org.apache.hadoop.hive.ql.metadata.HiveException:hive在处理行{“xmldata”：“}时运行时出错我使用的xml文件是： <?xml version="1.0" encoding="UTF-8"?> <catalog> <book> <id>11</id> <gen

我试图将XML数据加载到配置单元中，但出现错误：

java.lang.RuntimeException:org.apache.hadoop.hive.ql.metadata.HiveException:hive在处理行{“xmldata”：“}时运行时出错

我使用的xml文件是：

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
  <id>11</id>
  <genre>Computer</genre>
  <price>44</price>
</book>
<book>
  <id>44</id>
  <genre>Fantasy</genre>
  <price>5</price>
</book>
</catalog>

1) Create TABLE xmltable(xmldata string) STORED AS TEXTFILE;
LOAD DATA lOCAL INPATH '/home/user/xmlfile.xml' OVERWRITE INTO TABLE xmltable;

2) CREATE VIEW xmlview (id,genre,price)
AS SELECT
xpath(xmldata, '/catalog[1]/book[1]/id'),
xpath(xmldata, '/catalog[1]/book[1]/genre'),
xpath(xmldata, '/catalog[1]/book[1]/price')
FROM xmltable;

3) CREATE TABLE xmlfinal AS SELECT * FROM xmlview;

4) SELECT * FROM xmlfinal WHERE id ='11

在第二次查询之前，一切都很好，但当我执行第三次查询时，它给了我错误：

错误如下：

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
    at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error    while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:675)
    at org.apache.hadoop.hive.ql.exec

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

java.lang.RuntimeException:org.apache.hadoop.hive.ql.metadata.HiveException:hive在处理行{“xmldata”：“}时发生运行时错误
位于org.apache.hadoop.hive.ql.exec.ExecMapper.map（ExecMapper.java:159）
位于org.apache.hadoop.mapred.MapRunner.run（MapRunner.java:50）
位于org.apache.hadoop.mapred.MapTask.runOldMapper（MapTask.java:417）
位于org.apache.hadoop.mapred.MapTask.run（MapTask.java:332）
位于org.apache.hadoop.mapred.Child$4.run（Child.java:268）
位于java.security.AccessController.doPrivileged（本机方法）
位于javax.security.auth.Subject.doAs（Subject.java:415）
位于org.apache.hadoop.security.UserGroupInformation.doAs（UserGroupInformation.java:1438）
位于org.apache.hadoop.mapred.Child.main（Child.java:262）
由以下原因引起：org.apache.hadoop.hive.ql.metadata.HiveException:处理行{“xmldata”：“”}
位于org.apache.hadoop.hive.ql.exec.MapOperator.process（MapOperator.java:675）
位于org.apache.hadoop.hive.ql.exec
失败：执行错误，从org.apache.hadoop.hive.ql.exec.MapRedTask返回代码2

那么哪里出了问题？此外，我正在使用正确的xml文件

谢谢，

Shree

错误原因：

1）案例1：（您的案例）-xml内容正在以逐行方式馈送到配置单元

输入xml:

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
  <id>11</id>
  <genre>Computer</genre>
  <price>44</price>
</book>
<book>
  <id>44</id>
  <genre>Fantasy</genre>
  <price>5</price>
</book>
</catalog>

select count(*) from xmltable;  // return 13 rows - means each line in individual row with col xmldata

select count(*) from xmltable; // returns 1 row - XML is properly read as complete XML.

xmldata   = <?xml version="1.0" encoding="UTF-8"?><catalog><book> ...... </catalog>

--Load xml data to table
DROP table xmltable;
Create TABLE xmltable(xmldata string) STORED AS TEXTFILE;
LOAD DATA lOCAL INPATH '/home/vijay/data-input.xml' OVERWRITE INTO TABLE xmltable;

-- check contents
SELECT * from xmltable;

-- create view
Drop view  MyxmlView;
CREATE VIEW MyxmlView(id, genre, price) AS
SELECT
 xpath(xmldata, 'catalog/book/id/text()'),
 xpath(xmldata, 'catalog/book/genre/text()'),
 xpath(xmldata, 'catalog/book/price/text()')
FROM xmltable;

-- check view
SELECT id, genre,price FROM MyxmlView;


ADD jar /home/vijay/brickhouse-0.7.0-SNAPSHOT.jar;  --Add brickhouse jar 

CREATE TEMPORARY FUNCTION array_index AS 'brickhouse.udf.collect.ArrayIndexUDF';
CREATE TEMPORARY FUNCTION numeric_range AS 'brickhouse.udf.collect.NumericRange';

SELECT 
   array_index( id, n ) as my_id,
   array_index( genre, n ) as my_genre,
   array_index( price, n ) as my_price
from MyxmlView
lateral view numeric_range( size( id )) MyxmlView as n;

hive > SELECT
     >    array_index( id, n ) as my_id,
     >    array_index( genre, n ) as my_genre,
     >    array_index( price, n ) as my_price
     > from MyxmlView
     > lateral view numeric_range( size( id )) MyxmlView as n;
Automatically selecting local only mode for query
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: /tmp/vijay/.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2014-07-09 05:36:45,220 null map = 0%,  reduce = 0%
2014-07-09 05:36:48,226 null map = 100%,  reduce = 0%
Ended Job = job_local_0001
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
OK
my_id      my_genre      my_price
11      Computer        44
44      Fantasy 5

错误原因：

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
  <id>11</id>
  <genre>Computer</genre>
  <price>44</price>
</book>
<book>
  <id>44</id>
  <genre>Fantasy</genre>
  <price>5</price>
</book>
</catalog>

select count(*) from xmltable;  // return 13 rows - means each line in individual row with col xmldata

select count(*) from xmltable; // returns 1 row - XML is properly read as complete XML.

xmldata   = <?xml version="1.0" encoding="UTF-8"?><catalog><book> ...... </catalog>

--Load xml data to table
DROP table xmltable;
Create TABLE xmltable(xmldata string) STORED AS TEXTFILE;
LOAD DATA lOCAL INPATH '/home/vijay/data-input.xml' OVERWRITE INTO TABLE xmltable;

-- check contents
SELECT * from xmltable;

-- create view
Drop view  MyxmlView;
CREATE VIEW MyxmlView(id, genre, price) AS
SELECT
 xpath(xmldata, 'catalog/book/id/text()'),
 xpath(xmldata, 'catalog/book/genre/text()'),
 xpath(xmldata, 'catalog/book/price/text()')
FROM xmltable;

-- check view
SELECT id, genre,price FROM MyxmlView;


ADD jar /home/vijay/brickhouse-0.7.0-SNAPSHOT.jar;  --Add brickhouse jar 

CREATE TEMPORARY FUNCTION array_index AS 'brickhouse.udf.collect.ArrayIndexUDF';
CREATE TEMPORARY FUNCTION numeric_range AS 'brickhouse.udf.collect.NumericRange';

SELECT 
   array_index( id, n ) as my_id,
   array_index( genre, n ) as my_genre,
   array_index( price, n ) as my_price
from MyxmlView
lateral view numeric_range( size( id )) MyxmlView as n;

hive > SELECT
     >    array_index( id, n ) as my_id,
     >    array_index( genre, n ) as my_genre,
     >    array_index( price, n ) as my_price
     > from MyxmlView
     > lateral view numeric_range( size( id )) MyxmlView as n;
Automatically selecting local only mode for query
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: /tmp/vijay/.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2014-07-09 05:36:45,220 null map = 0%,  reduce = 0%
2014-07-09 05:36:48,226 null map = 100%,  reduce = 0%
Ended Job = job_local_0001
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
OK
my_id      my_genre      my_price
11      Computer        44
44      Fantasy 5

XML被解读为13个部分，而不是统一的。所以XML是无效的

2）案例2：xml内容应作为单字符串提供给配置单元-XpathUDFs有效引用语法：所有函数的格式如下：xpath（xml字符串，xpath表达式字符串）。*

input.xml

<?xml version="1.0" encoding="UTF-8"?><catalog><book><id>11</id><genre>Computer</genre><price>44</price></book><book><id>44</id><genre>Fantasy</genre><price>5</price></book></catalog>

表示：

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
  <id>11</id>
  <genre>Computer</genre>
  <price>44</price>
</book>
<book>
  <id>44</id>
  <genre>Fantasy</genre>
  <price>5</price>
</book>
</catalog>

select count(*) from xmltable;  // return 13 rows - means each line in individual row with col xmldata

select count(*) from xmltable; // returns 1 row - XML is properly read as complete XML.

xmldata   = <?xml version="1.0" encoding="UTF-8"?><catalog><book> ...... </catalog>

--Load xml data to table
DROP table xmltable;
Create TABLE xmltable(xmldata string) STORED AS TEXTFILE;
LOAD DATA lOCAL INPATH '/home/vijay/data-input.xml' OVERWRITE INTO TABLE xmltable;

-- check contents
SELECT * from xmltable;

-- create view
Drop view  MyxmlView;
CREATE VIEW MyxmlView(id, genre, price) AS
SELECT
 xpath(xmldata, 'catalog/book/id/text()'),
 xpath(xmldata, 'catalog/book/genre/text()'),
 xpath(xmldata, 'catalog/book/price/text()')
FROM xmltable;

-- check view
SELECT id, genre,price FROM MyxmlView;


ADD jar /home/vijay/brickhouse-0.7.0-SNAPSHOT.jar;  --Add brickhouse jar 

CREATE TEMPORARY FUNCTION array_index AS 'brickhouse.udf.collect.ArrayIndexUDF';
CREATE TEMPORARY FUNCTION numeric_range AS 'brickhouse.udf.collect.NumericRange';

SELECT 
   array_index( id, n ) as my_id,
   array_index( genre, n ) as my_genre,
   array_index( price, n ) as my_price
from MyxmlView
lateral view numeric_range( size( id )) MyxmlView as n;

hive > SELECT
     >    array_index( id, n ) as my_id,
     >    array_index( genre, n ) as my_genre,
     >    array_index( price, n ) as my_price
     > from MyxmlView
     > lateral view numeric_range( size( id )) MyxmlView as n;
Automatically selecting local only mode for query
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: /tmp/vijay/.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2014-07-09 05:36:45,220 null map = 0%,  reduce = 0%
2014-07-09 05:36:48,226 null map = 100%,  reduce = 0%
Ended Job = job_local_0001
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
OK
my_id      my_genre      my_price
11      Computer        44
44      Fantasy 5

在这里找到Jar-->

这里的示例-->

stackoverflow中的类似示例-

解决方案：

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
  <id>11</id>
  <genre>Computer</genre>
  <price>44</price>
</book>
<book>
  <id>44</id>
  <genre>Fantasy</genre>
  <price>5</price>
</book>
</catalog>

select count(*) from xmltable;  // return 13 rows - means each line in individual row with col xmldata

select count(*) from xmltable; // returns 1 row - XML is properly read as complete XML.

xmldata   = <?xml version="1.0" encoding="UTF-8"?><catalog><book> ...... </catalog>

--Load xml data to table
DROP table xmltable;
Create TABLE xmltable(xmldata string) STORED AS TEXTFILE;
LOAD DATA lOCAL INPATH '/home/vijay/data-input.xml' OVERWRITE INTO TABLE xmltable;

-- check contents
SELECT * from xmltable;

-- create view
Drop view  MyxmlView;
CREATE VIEW MyxmlView(id, genre, price) AS
SELECT
 xpath(xmldata, 'catalog/book/id/text()'),
 xpath(xmldata, 'catalog/book/genre/text()'),
 xpath(xmldata, 'catalog/book/price/text()')
FROM xmltable;

-- check view
SELECT id, genre,price FROM MyxmlView;


ADD jar /home/vijay/brickhouse-0.7.0-SNAPSHOT.jar;  --Add brickhouse jar 

CREATE TEMPORARY FUNCTION array_index AS 'brickhouse.udf.collect.ArrayIndexUDF';
CREATE TEMPORARY FUNCTION numeric_range AS 'brickhouse.udf.collect.NumericRange';

SELECT 
   array_index( id, n ) as my_id,
   array_index( genre, n ) as my_genre,
   array_index( price, n ) as my_price
from MyxmlView
lateral view numeric_range( size( id )) MyxmlView as n;

hive > SELECT
     >    array_index( id, n ) as my_id,
     >    array_index( genre, n ) as my_genre,
     >    array_index( price, n ) as my_price
     > from MyxmlView
     > lateral view numeric_range( size( id )) MyxmlView as n;
Automatically selecting local only mode for query
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: /tmp/vijay/.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2014-07-09 05:36:45,220 null map = 0%,  reduce = 0%
2014-07-09 05:36:48,226 null map = 100%,  reduce = 0%
Ended Job = job_local_0001
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
OK
my_id      my_genre      my_price
11      Computer        44
44      Fantasy 5

输出：

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
  <id>11</id>
  <genre>Computer</genre>
  <price>44</price>
</book>
<book>
  <id>44</id>
  <genre>Fantasy</genre>
  <price>5</price>
</book>
</catalog>

select count(*) from xmltable;  // return 13 rows - means each line in individual row with col xmldata

select count(*) from xmltable; // returns 1 row - XML is properly read as complete XML.

xmldata   = <?xml version="1.0" encoding="UTF-8"?><catalog><book> ...... </catalog>

--Load xml data to table
DROP table xmltable;
Create TABLE xmltable(xmldata string) STORED AS TEXTFILE;
LOAD DATA lOCAL INPATH '/home/vijay/data-input.xml' OVERWRITE INTO TABLE xmltable;

-- check contents
SELECT * from xmltable;

-- create view
Drop view  MyxmlView;
CREATE VIEW MyxmlView(id, genre, price) AS
SELECT
 xpath(xmldata, 'catalog/book/id/text()'),
 xpath(xmldata, 'catalog/book/genre/text()'),
 xpath(xmldata, 'catalog/book/price/text()')
FROM xmltable;

-- check view
SELECT id, genre,price FROM MyxmlView;


ADD jar /home/vijay/brickhouse-0.7.0-SNAPSHOT.jar;  --Add brickhouse jar 

CREATE TEMPORARY FUNCTION array_index AS 'brickhouse.udf.collect.ArrayIndexUDF';
CREATE TEMPORARY FUNCTION numeric_range AS 'brickhouse.udf.collect.NumericRange';

SELECT 
   array_index( id, n ) as my_id,
   array_index( genre, n ) as my_genre,
   array_index( price, n ) as my_price
from MyxmlView
lateral view numeric_range( size( id )) MyxmlView as n;

hive > SELECT
     >    array_index( id, n ) as my_id,
     >    array_index( genre, n ) as my_genre,
     >    array_index( price, n ) as my_price
     > from MyxmlView
     > lateral view numeric_range( size( id )) MyxmlView as n;
Automatically selecting local only mode for query
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: /tmp/vijay/.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2014-07-09 05:36:45,220 null map = 0%,  reduce = 0%
2014-07-09 05:36:48,226 null map = 100%,  reduce = 0%
Ended Job = job_local_0001
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
OK
my_id      my_genre      my_price
11      Computer        44
44      Fantasy 5

所用时间：8.541秒，获取：2行

根据问题负责人的要求添加更多信息：

首先尝试加载文件my add file path to file，这将解决您的问题，因为在我的案例中，问题已经解决了

Oracle XML Extensions for Hive可用于在XML上创建Hive表，如下所示。

然后按照以下步骤获得所需的解决方案，只需更改此源数据即可

 <catalog><book><id>11</id><genre>Computer</genre><price>44</price></book></catalog>
<catalog><book><id>44</id><genre>Fantasy</genre><price>5</price></book></catalog>

现在您将获得如下所示的ans：

[“11”][“计算机”][“44”]

[“44”][“幻想”][“5”]

如果应用xapth_字符串、xpath_int、xpath_int UDF，您将得到如下ans

11计算机44

44幻想5

谢谢

还要确保XML文件在最后一个结束标记的末尾不包含任何空格。在我的例子中，源文件有一个，每当我将文件加载到配置单元中时，结果表中都包含空值。因此，每当我应用xpath函数时，结果都会有一些

尽管xpath_字符串函数可以工作，但xpath_double和xpath_int函数从未工作过。它不断抛出这个例外-

Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"line":""}

以上帖子有更新吗？希望你收到[致命错误]：n:nn：文件过早结束。在hive terminalHi上，您是对的，我应该将xmldata作为单个字符串输入，现在我能够创建xmlview而没有任何错误。但是我没有得到正确的结果。我使用了与上面发布的相同的查询。如果我使用

xpath\u字符串而不是xpath
，那么当我启动第四个查询时，即，SELECT*fromXMLFinal
，得到的结果为[]
Alos，只得到第一行作为输出，即11计算机44
。但我希望返回两行作为结果。为什么XPATH
不返回任何结果？您想要这样的输出吗？行：11计算机44，行：2 44 fantacy Five是的，我希望输出为第1行，第2行<代码>11计算机44

44幻想5

。但是我怎样才能做到呢？嗨，维杰，我有一个疑问。正如你建议的那样，添加jar。我不明白？我需要创建jar还是需要添加现有的jar？我不明白那部分。你能告诉我怎么做吗？我还需要在哪里添加jar（位置）？我添加了您指定的快照jar，当我运行代码

时，选择array_index（id，n）作为我的_id，array_index（genre，n）作为我的_genre，array_index（price，n）作为我的_price from xmlView横向视图数值_range（size（id））xmlView作为n获取下面的错误<代码>失败：SemanticException[错误10016]：第6:34行参数类型不匹配“id”：“map”或“list”应为函数大小，但找到了“int”

。在创建视图时，我对id使用了

XPATH\u int

，对genre&price使用了

XPATH\u string

。还尝试只提供“XPATH”，但仍然存在相同的错误。请使用我使用的XPATH，因为XPATH返回数组，而XPATH_字符串只运行一个字符串。创建视图，如我在answer.official配置单元文档中所示：xpath（）函数总是返回一个包含字符串的配置单元数组。xpath_string（）函数返回第一个匹配节点的文本——但我们需要所有书籍，而不是一本。首先，确保MyxmlView正常。可以粘贴结果：从MyxmlView中选择id、流派、价格；我使用了

XPATH

，当我从xmlview中选择id、流派、价格时；将输出获取为

[]

。在添加jar和创建2个节奏之后