Hive 如何将XML数据文件加载到配置单元表中？_Hive

Hive 如何将XML数据文件加载到配置单元表中？

hive

Hive 如何将XML数据文件加载到配置单元表中？,hive,Hive,将XML数据文件加载到配置单元表时，我收到以下错误消息： FAILED: SemanticException 7:9 Input format must implement InputFormat. Error encountered near token 'StoresXml'. 加载XML文件的方式如下所示： **创建一个表StoresXml 'CREATE EXTERNAL TABLE StoresXml (storexml string) STORED AS INP

将XML数据文件加载到配置单元表时，我收到以下错误消息：

FAILED: SemanticException 7:9 Input format must implement InputFormat. Error   encountered near token 'StoresXml'.

加载XML文件的方式如下所示：

**创建一个表StoresXml

   'CREATE EXTERNAL TABLE StoresXml (storexml string)
   STORED AS INPUTFORMAT 'org.apache.mahout.classifier.bayes.XmlInputFormat'
   OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
   LOCATION '/user/hive/warehouse/stores';'

**位置/用户/蜂巢/仓库/商店在HDFS中

将inpath中的数据加载到表StoresXml中

现在，问题是当我从StoresXml表中选择任何列时，就会出现上述错误

请帮助我。我哪里出错了？

1首先，您需要创建单列表，如

CREATE TABLE xmlsample(xml string);

2之后，您需要将local/hdfs中的数据加载到配置单元表中，如

LOAD DATA INPATH '---------' INTO TABLE XMLSAMPLE;

3接下来，通过使用XPATH、XPATH\u数组、类似XPATH\u字符串的示例XML查询..

我开发了一个工具，可以从csv文件生成配置单元脚本。下面是几个关于如何生成文件的示例。工具-

使用Browse和set-hadoop根目录ex:/user/bigdataproject选择CSV文件/

该工具使用所有csv文件生成Hadoop脚本，下面是生成Hadoop脚本以将csv插入Hadoop

生成的配置单元脚本示例

谢谢

Vijay

我刚刚使用xpath将这个transactions.xml文件加载到配置单元表中对于XML文件： **将xml文件的记录放在一行中：

terminal> cat /home/cloudera/Desktop/Test/Transactions_xml.xml | tr -d '&' | tr '\n' ' ' | tr '\r' ' ' | sed 's|</record>|</record>\n|g' | grep -v '^\s*$' > /home/cloudera/Desktop/trx_xml;

terminal> hadoop fs -put /home/cloudera/Desktop/trx_xml.xml  /user/cloudera/DataTest/Transactions_xml

hive>create table Transactions_xml1(xmldata string);

hive>load data inpath '/user/cloudera/DataTest/Transactions_xml' overwrite into table Transactions_xml1;

hive>create table Transactions_xml(trx_id int,account int,amount int);

hive>insert overwrite table Transactions_xml select xpath_int(xmldata,'record/Tid'),
xpath_int(xmldata,'record/AccounID'),
xpath_int(xmldata,'record/Amount') from Transactions_xml1;

我希望这对你有帮助。让我知道结果。

因此，对于输入文件中的每一行，我将在xmlsample表中有一个条目。这意味着，如果xml文件中的行不是格式良好的xml片段，那么xmlsample中的条目也不会格式良好。我猜XPATH、XPATH\u数组、XPATH\u字符串等方法都不起作用了？你的工具是用于csv文件的，OP询问了xml。您应该删除此项，因为它不是此问题的答案。

CREATE DATABASE IF NOT EXISTS lahman;

USE lahman;

CREATE TABLE AllstarFull (playerID string,yearID string,gameNum string,gameID string,teamID string,lgID string,GP string,startingPos string) row format delimited fields terminated by ',' stored as textfile;

LOAD DATA INPATH '/user/bigdataproject/AllstarFull.csv' OVERWRITE INTO TABLE AllstarFull;

SELECT * FROM AllstarFull;
terminal> cat /home/cloudera/Desktop/Test/Transactions_xml.xml | tr -d '&' | tr '\n' ' ' | tr '\r' ' ' | sed 's|</record>|</record>\n|g' | grep -v '^\s*$' > /home/cloudera/Desktop/trx_xml;

terminal> hadoop fs -put /home/cloudera/Desktop/trx_xml.xml  /user/cloudera/DataTest/Transactions_xml

hive>create table Transactions_xml1(xmldata string);

hive>load data inpath '/user/cloudera/DataTest/Transactions_xml' overwrite into table Transactions_xml1;

hive>create table Transactions_xml(trx_id int,account int,amount int);

hive>insert overwrite table Transactions_xml select xpath_int(xmldata,'record/Tid'),
xpath_int(xmldata,'record/AccounID'),
xpath_int(xmldata,'record/Amount') from Transactions_xml1;