使用Tez在Azure HDInsight上的配置单元上重建索引失败
我尝试在Azure HDInsight上启用Tez的配置单元上创建索引。 我可以成功创建索引,但无法重建索引:作业失败,输出为:使用Tez在Azure HDInsight上的配置单元上重建索引失败,azure,hadoop,hive,azure-hdinsight,Azure,Hadoop,Hive,Azure Hdinsight,我尝试在Azure HDInsight上启用Tez的配置单元上创建索引。 我可以成功创建索引,但无法重建索引:作业失败,输出为: Map 1: -/- Reducer 2: 0/1 Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1421234198072_0091_1_01, diagnostics=[Vertex Input: measures initializer failed.] Vertex kil
Map 1: -/- Reducer 2: 0/1
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1421234198072_0091_1_01, diagnostics=[Vertex Input: measures initializer failed.]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1421234198072_0091_1_00, diagnostics=[Vertex > received Kill in INITED state.]
DAG failed due to vertex failure. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask
我已使用以下作业创建了表和索引:
DROP TABLE IF EXISTS Measures;
CREATE TABLE Measures(
topology string,
val double,
date timestamp,
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE LOCATION 'wasb://<mycontainer>@<mystorage>.blob.core.windows.net/';
CREATE INDEX measures_index_topology ON TABLE Measures (topology) AS 'COMPACT' WITH DEFERRED REBUILD;
CREATE INDEX measures_index_date ON TABLE Measures (date) AS 'COMPACT' WITH DEFERRED REBUILD;
ALTER INDEX measures_index_topology ON Measures REBUILD;
ALTER INDEX measures_index_date ON Measures REBUILD;
删除表格(如果存在);
创建表度量值(
拓扑字符串,
瓦尔·杜普,
日期时间戳,
)
行格式SERDE'org.openx.data.jsonserde.jsonserde'
存储为文本文件位置“wasb://@.blob.core.windows.net/”;
在表度量(拓扑)上创建索引度量\u索引\u拓扑,作为“压缩”,并延迟重建;
将表度量(日期)上的索引度量值\索引\日期创建为“压缩”,并延迟重建;
在度量值重建上更改索引度量值\u索引\u拓扑;
在措施重建时更改索引措施\索引日期;
我错在哪里?为什么我的重建索引失败了?
最好的祝愿看来Tez在空表上生成索引可能有问题。我可以得到与您相同的错误(不使用JSON SerDe),如果您查看失败的DAG的应用程序日志,您可能会看到如下内容:
java.lang.NullPointerException
at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:254)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:299)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getSplits(TezGroupedSplitsInputFormat.java:68)
at org.apache.tez.mapreduce.hadoop.MRHelpers.generateOldSplits(MRHelpers.java:263)
at org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:139)
at org.apache.tez.dag.app.dag.RootInputInitializerRunner$InputInitializerCallable$1.run(RootInputInitializerRunner.java:154)
at org.apache.tez.dag.app.dag.RootInputInitializerRunner$InputInitializerCallable$1.run(RootInputInitializerRunner.java:146)
...
如果使用单个虚拟记录填充表,它似乎工作正常。我用过:
INSERT INTO TABLE Measures SELECT market,0,0 FROM hivesampletable limit 1;
在此之后,索引重建能够无误地运行