Hive 如何将XML数据文件加载到配置单元表中?
将XML数据文件加载到配置单元表时,我收到以下错误消息:Hive 如何将XML数据文件加载到配置单元表中?,hive,Hive,将XML数据文件加载到配置单元表时,我收到以下错误消息: FAILED: SemanticException 7:9 Input format must implement InputFormat. Error encountered near token 'StoresXml'. 加载XML文件的方式如下所示: **创建一个表StoresXml 'CREATE EXTERNAL TABLE StoresXml (storexml string) STORED AS INP
FAILED: SemanticException 7:9 Input format must implement InputFormat. Error encountered near token 'StoresXml'.
加载XML文件的方式如下所示:
**创建一个表StoresXml
'CREATE EXTERNAL TABLE StoresXml (storexml string)
STORED AS INPUTFORMAT 'org.apache.mahout.classifier.bayes.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/user/hive/warehouse/stores';'
**位置/用户/蜂巢/仓库/商店在HDFS中
将inpath中的数据加载到表StoresXml中
现在,问题是当我从StoresXml表中选择任何列时,就会出现上述错误
请帮助我。我哪里出错了?1首先,您需要创建单列表,如
CREATE TABLE xmlsample(xml string);
2之后,您需要将local/hdfs中的数据加载到配置单元表中,如
LOAD DATA INPATH '---------' INTO TABLE XMLSAMPLE;
3接下来,通过使用XPATH、XPATH\u数组、类似XPATH\u字符串的示例XML查询..我开发了一个工具,可以从csv文件生成配置单元脚本。下面是几个关于如何生成文件的示例。 工具- 使用Browse和set-hadoop根目录ex:/user/bigdataproject选择CSV文件/ 该工具使用所有csv文件生成Hadoop脚本,下面是 生成Hadoop脚本以将csv插入Hadoop 生成的配置单元脚本示例 谢谢
Vijay我刚刚使用xpath将这个transactions.xml文件加载到配置单元表中 对于XML文件: **将xml文件的记录放在一行中:
terminal> cat /home/cloudera/Desktop/Test/Transactions_xml.xml | tr -d '&' | tr '\n' ' ' | tr '\r' ' ' | sed 's|</record>|</record>\n|g' | grep -v '^\s*$' > /home/cloudera/Desktop/trx_xml;
terminal> hadoop fs -put /home/cloudera/Desktop/trx_xml.xml /user/cloudera/DataTest/Transactions_xml
hive>create table Transactions_xml1(xmldata string);
hive>load data inpath '/user/cloudera/DataTest/Transactions_xml' overwrite into table Transactions_xml1;
hive>create table Transactions_xml(trx_id int,account int,amount int);
hive>insert overwrite table Transactions_xml select xpath_int(xmldata,'record/Tid'),
xpath_int(xmldata,'record/AccounID'),
xpath_int(xmldata,'record/Amount') from Transactions_xml1;
我希望这对你有帮助。让我知道结果。因此,对于输入文件中的每一行,我将在xmlsample表中有一个条目。这意味着,如果xml文件中的行不是格式良好的xml片段,那么xmlsample中的条目也不会格式良好。我猜XPATH、XPATH\u数组、XPATH\u字符串等方法都不起作用了?你的工具是用于csv文件的,OP询问了xml。您应该删除此项,因为它不是此问题的答案。
CREATE DATABASE IF NOT EXISTS lahman;
USE lahman;
CREATE TABLE AllstarFull (playerID string,yearID string,gameNum string,gameID string,teamID string,lgID string,GP string,startingPos string) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA INPATH '/user/bigdataproject/AllstarFull.csv' OVERWRITE INTO TABLE AllstarFull;
SELECT * FROM AllstarFull;
terminal> cat /home/cloudera/Desktop/Test/Transactions_xml.xml | tr -d '&' | tr '\n' ' ' | tr '\r' ' ' | sed 's|</record>|</record>\n|g' | grep -v '^\s*$' > /home/cloudera/Desktop/trx_xml;
terminal> hadoop fs -put /home/cloudera/Desktop/trx_xml.xml /user/cloudera/DataTest/Transactions_xml
hive>create table Transactions_xml1(xmldata string);
hive>load data inpath '/user/cloudera/DataTest/Transactions_xml' overwrite into table Transactions_xml1;
hive>create table Transactions_xml(trx_id int,account int,amount int);
hive>insert overwrite table Transactions_xml select xpath_int(xmldata,'record/Tid'),
xpath_int(xmldata,'record/AccounID'),
xpath_int(xmldata,'record/Amount') from Transactions_xml1;