Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop 配置单元SQL函数在执行过程中卡住_Hadoop_Hive - Fatal编程技术网

Hadoop 配置单元SQL函数在执行过程中卡住

Hadoop 配置单元SQL函数在执行过程中卡住,hadoop,hive,Hadoop,Hive,我正在单节点分发版中运行Hive1.2.1和Hadoop2.6.0。我在HDFS目录/usr//test.JSON中有一个简单的JSON文件 我创建一个配置单元表,并使用以下脚本加载此数据: CREATE EXTERNAL TABLE json_table (json string) LOCATION '/user/<name>/test.json' 现在,问题是: select count(*) from json_table; 此查询在幕后启动乔布斯先生,但从未结束。我必须

我正在单节点分发版中运行Hive1.2.1和Hadoop2.6.0。我在HDFS目录
/usr//test.JSON
中有一个简单的JSON文件

我创建一个配置单元表,并使用以下脚本加载此数据:

CREATE EXTERNAL TABLE json_table (json string) LOCATION '/user/<name>/test.json' 
现在,问题是:

select count(*) from json_table;
此查询在幕后启动乔布斯先生,但从未结束。我必须从命令行手动终止查询。当我从命令行启用日志或控制台时,我无法从中推断出多少

简而言之,任何具有SQL函数的查询都会卡在配置单元中!!我是否缺少SQL函数所需的JAR,这些函数需要放在HDFS中

从命令行粘贴堆栈跟踪:

hive> select get_json_object(json_table.json,'$.doc') from json_table;

OK
123
456
789
345
987
hive> select count(*) from json_table;
15/11/28 21:44:22 [main]: INFO log.PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:22 [main]: INFO log.PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:22 [main]: INFO log.PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:22 [main]: INFO log.PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:22 [main]: INFO parse.ParseDriver: Parsing command: select count(*) from json_table
15/11/28 21:44:23 [main]: INFO parse.ParseDriver: Parse Completed   
15/11/28 21:44:23 [main]: INFO log.PerfLogger: </PERFLOG method=parse start=1448765062471 end=1448765063122 duration=651 from=org.apache.hadoop.hive.ql.Driver>    
15/11/28 21:44:23 [main]: INFO log.PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>    
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Starting Semantic Analysis    
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Completed phase 1 of Semantic Analysis    
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Get metadata for source tables    
15/11/28 21:44:23 [main]: INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=json_table    
15/11/28 21:44:23 [main]: INFO HiveMetaStore.audit: ugi=sriramvaradharajan  ip=unknown-ip-addr  cmd=get_table : db=default tbl=json_table       
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Get metadata for subqueries    
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Get metadata for destination tables    
15/11/28 21:44:23 [main]: INFO ql.Context: New scratch dir is hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1    
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Completed getting MetaData in Semantic Analysis    
15/11/28 21:44:23 [main]: INFO parse.BaseSemanticAnalyzer: Not invoking CBO because the statement has too few joins    
15/11/28 21:44:23 [main]: INFO common.FileUtils: Creating directory if it doesn't exist: hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1/-mr-10000/.hive-staging_hive_2015-11-28_21-44-22_469_6217205930037435432-1
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Set stats collection dir : hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1/-mr-10000/.hive-staging_hive_2015-11-28_21-44-22_469_6217205930037435432-1/-ext-10002    
15/11/28 21:44:23 [main]: INFO ppd.OpProcFactory: Processing for FS(6)    
15/11/28 21:44:23 [main]: INFO ppd.OpProcFactory: Processing for SEL(5)    
15/11/28 21:44:23 [main]: INFO ppd.OpProcFactory: Processing for GBY(4)    
15/11/28 21:44:23 [main]: INFO ppd.OpProcFactory: Processing for RS(3)    
15/11/28 21:44:23 [main]: INFO ppd.OpProcFactory: Processing for GBY(2)    
15/11/28 21:44:23 [main]: INFO ppd.OpProcFactory: Processing for SEL(1)
15/11/28 21:44:23 [main]: INFO ppd.OpProcFactory: Processing for TS(0)    
15/11/28 21:44:23 [main]: INFO optimizer.ColumnPrunerProcFactory: RS 3 oldColExprMap: {VALUE._col0=Column[_col0]}    
15/11/28 21:44:23 [main]: INFO optimizer.ColumnPrunerProcFactory: RS 3 newColExprMap: {VALUE._col0=Column[_col0]}    
15/11/28 21:44:23 [main]: INFO log.PerfLogger: <PERFLOG method=partition-retrieving from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner>    
15/11/28 21:44:23 [main]: INFO log.PerfLogger: </PERFLOG method=partition-retrieving start=1448765063731 end=1448765063732 duration=1 from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner>    
15/11/28 21:44:23 [main]: INFO physical.NullScanTaskDispatcher: Looking for table scans where optimization is applicable    
15/11/28 21:44:23 [main]: INFO physical.NullScanTaskDispatcher: Found 0 null table scans    
15/11/28 21:44:23 [main]: INFO physical.NullScanTaskDispatcher: Looking for table scans where optimization is applicable
15/11/28 21:44:23 [main]: INFO physical.NullScanTaskDispatcher: Found 0 null table scans
15/11/28 21:44:23 [main]: INFO physical.NullScanTaskDispatcher: Looking for table scans where optimization is applicable
15/11/28 21:44:23 [main]: INFO physical.NullScanTaskDispatcher: Found 0 null table scans
15/11/28 21:44:23 [main]: INFO parse.CalcitePlanner: Completed plan generation
15/11/28 21:44:23 [main]: INFO ql.Driver: Semantic Analysis Completed
15/11/28 21:44:23 [main]: INFO log.PerfLogger: </PERFLOG method=semanticAnalyze start=1448765063125 end=1448765063749 duration=624 from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:23 [main]: INFO exec.ListSinkOperator: Initializing operator OP[7]
15/11/28 21:44:23 [main]: INFO exec.ListSinkOperator: Initialization Done 7 OP
15/11/28 21:44:23 [main]: INFO exec.ListSinkOperator: Operator 7 OP initialized
15/11/28 21:44:23 [main]: INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)
15/11/28 21:44:23 [main]: INFO log.PerfLogger: </PERFLOG method=compile start=1448765062445 end=1448765063772 duration=1327 from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:23 [main]: INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
15/11/28 21:44:23 [main]: INFO log.PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:23 [main]: INFO ql.Driver: Starting command(queryId=sriramvaradharajan_20151128214422_cd5ef5dc-1862-4aeb-a4d1-361dc7be0056): select count(*) from json_table
Query ID = sriramvaradharajan_20151128214422_cd5ef5dc-1862-4aeb-a4d1-361dc7be0056
15/11/28 21:44:23 [main]: INFO ql.Driver: Query ID = sriramvaradharajan_20151128214422_cd5ef5dc-1862-4aeb-a4d1-361dc7be0056
Total jobs = 1
15/11/28 21:44:23 [main]: INFO ql.Driver: Total jobs = 1
15/11/28 21:44:23 [main]: INFO log.PerfLogger: </PERFLOG method=TimeToSubmit start=1448765062445 end=1448765063775 duration=1330 from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:23 [main]: INFO log.PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
15/11/28 21:44:23 [main]: INFO log.PerfLogger: <PERFLOG method=task.MAPRED.Stage-1 from=org.apache.hadoop.hive.ql.Driver>
Launching Job 1 out of 1
15/11/28 21:44:23 [main]: INFO ql.Driver: Launching Job 1 out of 1
15/11/28 21:44:23 [main]: INFO ql.Driver: Starting task [Stage-1:MAPRED] in serial mode
Number of reduce tasks determined at compile time: 1
15/11/28 21:44:23 [main]: INFO exec.Task: Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
15/11/28 21:44:23 [main]: INFO exec.Task: In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
15/11/28 21:44:23 [main]: INFO exec.Task:   set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
15/11/28 21:44:23 [main]: INFO exec.Task: In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
15/11/28 21:44:23 [main]: INFO exec.Task:   set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
15/11/28 21:44:23 [main]: INFO exec.Task: In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
15/11/28 21:44:23 [main]: INFO exec.Task:   set mapreduce.job.reduces=<number>
15/11/28 21:44:23 [main]: INFO ql.Context: New scratch dir is hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1
15/11/28 21:44:23 [main]: INFO mr.ExecDriver: Using org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
15/11/28 21:44:23 [main]: INFO exec.Utilities: Processing alias json_table
15/11/28 21:44:23 [main]: INFO exec.Utilities: Adding input file hdfs://localhost:9000/user/sriramvaradharajan
15/11/28 21:44:23 [main]: INFO exec.Utilities: Content Summary not cached for hdfs://localhost:9000/user/sriramvaradharajan
15/11/28 21:44:23 [main]: INFO ql.Context: New scratch dir is hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1
15/11/28 21:44:23 [main]: INFO log.PerfLogger: <PERFLOG method=serializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
15/11/28 21:44:23 [main]: INFO exec.Utilities: Serializing MapWork via kryo
15/11/28 21:44:23 [main]: INFO log.PerfLogger: </PERFLOG method=serializePlan start=1448765063830 end=1448765063973 duration=143 from=org.apache.hadoop.hive.ql.exec.Utilities>
15/11/28 21:44:23 [main]: INFO Configuration.deprecation: mapred.submit.replication is deprecated. Instead, use mapreduce.client.submit.file.replication
15/11/28 21:44:23 [main]: INFO log.PerfLogger: <PERFLOG method=serializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
15/11/28 21:44:23 [main]: INFO exec.Utilities: Serializing ReduceWork via kryo
15/11/28 21:44:23 [main]: INFO log.PerfLogger: </PERFLOG method=serializePlan start=1448765063980 end=1448765063994 duration=14 from=org.apache.hadoop.hive.ql.exec.Utilities>
15/11/28 21:44:23 [main]: ERROR mr.ExecDriver: yarn
15/11/28 21:44:24 [main]: INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/11/28 21:44:24 [main]: INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/11/28 21:44:24 [main]: INFO exec.Utilities: PLAN PATH = hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1/-mr-10004/be3b1969-bd83-4db2-b51c-c5e7b6596f79/map.xml
15/11/28 21:44:24 [main]: INFO exec.Utilities: PLAN PATH = hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1/-mr-10004/be3b1969-bd83-4db2-b51c-c5e7b6596f79/reduce.xml
15/11/28 21:44:24 [main]: WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/11/28 21:44:24 [main]: INFO log.PerfLogger: <PERFLOG method=getSplits from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat>
15/11/28 21:44:24 [main]: INFO exec.Utilities: PLAN PATH = hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1/-mr-10004/be3b1969-bd83-4db2-b51c-c5e7b6596f79/map.xml
15/11/28 21:44:24 [main]: INFO io.CombineHiveInputFormat: Total number of paths: 1, launching 1 threads to check non-combinable ones.
15/11/28 21:44:24 [main]: INFO io.CombineHiveInputFormat: CombineHiveInputSplit creating pool for hdfs://localhost:9000/user/sriramvaradharajan; using filter path hdfs://localhost:9000/user/sriramvaradharajan
15/11/28 21:44:24 [main]: INFO input.FileInputFormat: Total input paths to process : 2
15/11/28 21:44:24 [main]: INFO input.CombineFileInputFormat: DEBUG: Terminated node allocation with : CompletedNodes: 1, size left: 0
15/11/28 21:44:24 [main]: INFO io.CombineHiveInputFormat: number of splits 1
15/11/28 21:44:24 [main]: INFO io.CombineHiveInputFormat: Number of all splits 1
15/11/28 21:44:24 [main]: INFO log.PerfLogger: </PERFLOG method=getSplits start=1448765064446 end=1448765064479 duration=33 from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat>
15/11/28 21:44:24 [main]: INFO mapreduce.JobSubmitter: number of splits:1
15/11/28 21:44:24 [main]: INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1448752941502_0008
15/11/28 21:44:24 [main]: INFO impl.YarnClientImpl: Submitted application application_1448752941502_0008    
15/11/28 21:44:24 [main]: INFO mapreduce.Job: The url to track the job: h--ttp://admins-MacBook-Pro.local:8088/proxy/application_1448752941502_0008/
Starting Job = job_1448752941502_0008, Tracking URL = h--ttp://admins-MacBook-Pro.local:8088/proxy/application_1448752941502_0008/
15/11/28 21:44:24 [main]: INFO exec.Task: Starting Job = job_1448752941502_0008, Tracking URL = h--ttp://admins-MacBook-Pro.local:8088/proxy/application_1448752941502_0008/
Kill Command = /usr/local/Cellar/hadoop/2.6.0/bin/hadoop job  -kill job_1448752941502_0008
15/11/28 21:44:24 [main]: INFO exec.Task: Kill Command = /usr/local/Cellar/hadoop/2.6.0/bin/hadoop job  -kill job_1448752941502_0008
hive>从json_表中选择count(*);
15/11/28 21:44:22[主]:INFO log.PerfLogger:
15/11/28 21:44:22[主]:INFO log.PerfLogger:
15/11/28 21:44:22[主]:INFO log.PerfLogger:
15/11/28 21:44:22[主]:INFO log.PerfLogger:
15/11/28 21:44:22[main]:INFO parse.ParseDriver:Parsing命令:从json_表中选择count(*)
15/11/28 21:44:23[main]:信息解析。解析驱动程序:解析完成
15/11/28 21:44:23[主]:INFO log.PerfLogger:
15/11/28 21:44:23[主]:INFO log.PerfLogger:
15/11/28 21:44:23[main]:INFO parse.CalcitePlanner:开始语义分析
15/11/28 21:44:23[main]:INFO parse.CalcitePlanner:完成语义分析的第一阶段
15/11/28 21:44:23[main]:INFO parse.CalcitePlanner:获取源表的元数据
15/11/28 21:44:23[main]:INFO metastore.HiveMetaStore:0:get_table:db=default tbl=json_table
15/11/28 21:44:23[main]:INFO HiveMetaStore.audit:ugi=sriramvaradharajan ip=unknown ip addr cmd=get_table:db=default tbl=json_table
15/11/28 21:44:23[main]:INFO parse.CalcitePlanner:获取子查询的元数据
15/11/28 21:44:23[main]:INFO parse.CalcitePlanner:获取目标表的元数据
15/11/28 21:44:23[main]:信息ql.Context:新的临时目录是hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1
15/11/28 21:44:23[main]:INFO parse.CalcitePlanner:已完成语义分析中的元数据获取
15/11/28 21:44:23[main]:INFO parse.basesemanticalyzer:不调用CBO,因为语句的联接太少
15/11/28 21:44:23[main]:INFO common.FileUtils:创建不存在的目录:hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1/-mr-10000/.hive-staging_hive_2015-11-28_21-44-22_469_6217205930037435432-1/
15/11/28 21:44:23[main]:INFO parse.CalcitePlanner:Set stats collection dir:hdfs://localhost:9000/tmp/hive/sriramvaradharajan/2c73a21d-3600-4325-8f71-811db3f5368d/hive_2015-11-28_21-44-22_469_6217205930037435432-1/-mr-10000/.hive-staging_hive_2015-11-28_21-44-22_469_6217205930037435432-1/-ext-10002
15/11/28 21:44:23[主要]:信息ppd.OpProcFactory:FS的处理(6)
15/11/28 21:44:23[主要]:信息ppd.OpProcFactory:SEL处理(5)
15/11/28 21:44:23[主要]:信息ppd.OpProcFactory:GBY处理(4)
15/11/28 21:44:23[主要]:信息ppd.OpProcFactory:RS处理(3)
15/11/28 21:44:23[主要]:信息ppd.OpProcFactory:处理GBY(2)
15/11/28 21:44:23[主要]:信息ppd.OpProcFactory:SEL处理(1)
15/11/28 21:44:23[主要]:信息ppd.OpProcFactory:TS(0)的处理
15/11/28 21:44:23[main]:信息优化器.ColumnPrunerProcFactory:RS 3 oldColExprMap:{VALUE.\u col0=Column[\u col0]}
15/11/28 21:44:23[main]:INFO optimizer.ColumnPrunerProcFactory:RS 3 newColExprMap:{VALUE.\u col0=Column[\u col0]}
15/11/28 21:44:23[主]:INFO log.PerfLogger:
15/11/28 21:44:23[主]:INFO log.PerfLogger:
15/11/28 21:44:23[main]:INFO physical.NullScanTaskDispatcher:查找适用于优化的表扫描
15/11/28 21:44:23[main]:INFO physical.nullScanStankDispatcher:找到0个空表扫描
15/11/28 21:44:23[main]:INFO physical.NullScanTaskDispatcher:查找适用于优化的表扫描
15/11/28 21:44:23[main]:INFO physical.nullScanStankDispatcher:找到0个空表扫描
15/11/28 21:44:23[main]:INFO physical.NullScanTaskDispatcher:查找适用于优化的表扫描
15/11/28 21:44:23[main]:INFO physical.nullScanStankDispatcher:找到0个空表扫描
15/11/28 21:44:23[main]:INFO parse.CalcitePlanner:已完成计划生成
15/11/28 21:44:23[主要]:信息驱动程序:语义分析已完成
15/11/28 21:44:23[主]:INFO log.PerfLogger:
15/11/28 21:44:23[main]:INFO exec.ListSinkOperator:初始化运算符OP[7]
15/11/28 21:44:23[main]:INFO exec.ListSinkOperator:初始化完成7 OP
15/11/28 21:44:23[主]:INFO exec.ListSinkOperator:Operator 7操作已初始化
15/11/28 21:44:23[main]:信息ql.驱动程序:返回配置单元架构:架构(FieldSchema:[FieldSchema(名称:_c0,类型:bigint,注释:null)],属性:null)
15/11/28 21:44:23[主]:INFO log.PerfLogger:
15/11/28 21:44:23[主]:信息ql.驱动程序:并发模式已禁用,不创建锁管理器
15/11/28 21:44:23[主]:INFO log.PerfLogger:
15/11/28 21:44:23[主要]:信息ql.驱动程序:启动命令(queryId=sriramvaradharajan_20151128214422_cd5ef5dc-1862-4aeb-a4d1-361dc7be0056):从json_表中选择计数(*)
查询ID=sriramvaradharajan_2015112821422_cd5ef5dc-1862-4aeb-a4d1-361dc7be0056
15/11/28 21:44:23[主要]:信息ql.驱动程序:查询ID=sriramvaradharajan_2015112821422_cd5ef5dc-1862-4aeb-a4d1-361dc7be0056
职位总数=1
15/11/28 21:44:23[主要]:信息驱动程序:总作业数=1
15/11/28 21:44:23[主]:INFO log.PerfLogger:
15/11/28 21:44:23[主]:INFO log.PerfLogger:
15/11/28 21:44:23[主]:INFO log.PerfLogger:
正在启动作业1/1
15/11/28 21:44:23[主]:信息ql.驱动程序:启动作业1/1
15/11/28 21:44:23[主要]:信息ql.驱动程序:Sta