Hive Spark SQL-文件夹中的递归读取_Hive_Apache Spark Sql_Recursive Query_Parquet

Hive Spark SQL-文件夹中的递归读取

hive

Hive Spark SQL-文件夹中的递归读取,hive,apache-spark-sql,recursive-query,parquet,Hive,Apache Spark Sql,Recursive Query,Parquet,我尝试在SparkSQL中使用HiveContext来利用HiveQL中的一些windows函数。但它不能帮助我递归地读取文件夹中的数据文件（按年份和月份对文件夹进行分区）我的文件夹：驾驶员数据/输出日期/22/年=2016年驱动程序上的数据/Outputozie/22/年=2016/月=10 驾驶员的数据/输出日期/22/年=2016/月=9 驾驶员的数据/输出日期/22/年=2016/月=10/1 驾驶员的数据/输出数据/22/年=2016/月=10/2 驾驶员的数据/输出日期/22/

我尝试在SparkSQL中使用HiveContext来利用HiveQL中的一些windows函数。但它不能帮助我递归地读取文件夹中的数据文件（按年份和月份对文件夹进行分区）

我的文件夹：

驾驶员数据/输出日期/22/年=2016年

驱动程序上的数据/Outputozie/22/年=2016/月=10

驾驶员的数据/输出日期/22/年=2016/月=9

驾驶员的数据/输出日期/22/年=2016/月=10/1

驾驶员的数据/输出数据/22/年=2016/月=10/2

驾驶员的数据/输出日期/22/年=2016/月=10/3

驾驶员的数据/输出日期/22/年=2016/月=9/1

驾驶员的数据/输出日期/22/年=2016/月=9/2

驾驶员的数据/输出日期/22/年=2016/月=9/3

以下是我如何启动我的蜂巢上下文：

val conf=new SparkConf（）.setAppName（“提取过程”）.setIfMissing（“spark.master”、“local[*]”）
val sc=SparkContext.getOrCreate（conf）
sc.hadoopConfiguration.set（“mapreduce.fileoutputcommitter.marksuccessfuljobs”，“false”）
sc.hadoopConfiguration.set（“parquet.enable.summary元数据”，“false”）
sc.hadoopConfiguration.set（“mapreduce.input.fileinputformat.input.dir.recursive”、“true”）
sc.hadoopConfiguration.set（“hive.mapred.supports.subdirectories”，“true”）
//val hiveContext=sqlContext.asInstanceOf[hiveContext]
val hiveContext=sqlContext.asInstanceOf[hiveContext]
setConf（“spark.sql.parquet.compression.codec”，“snappy”）
setConf（“mapreduce.input.fileinputformat.input.dir.recursive”，“true”）
setConf（“mapred.input.dir.recursive”、“true”）
setConf（“hive.mapred.supports.subdirectories”，“true”）

读取文件：

hiveContext.read.parquet（URLDecover.decode）（partitionLocation.get.toString， “UTF-8”）） ==>异常：找不到文件

但对于SQL上下文来说，这没什么问题：

val sqlContext=新的sqlContext（sc）
setConf（“spark.sql.parquet.compression.codec”，“snappy”）
setConf（“mapreduce.input.fileinputformat.input.dir.recursive”，“true”）

谢谢你的建议