Scala Spark 2.2.0-无法递归读取目录结构

Scala Spark 2.2.0-无法递归读取目录结构,scala,hadoop,apache-spark,Scala,Hadoop,Apache Spark,问题摘要: 尽管设置了所需的Hadoop配置,但我无法使用Spark程序读取嵌套子目录(请参阅)。 我将错误粘贴到下面 感谢您的帮助 版本: Spark 2.2.0 输入目录布局: /user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=1502939225073/part-00000-3a44cd00-e895-4a01-9ab9-946064b739d4-c000.parque

问题摘要: 尽管设置了所需的Hadoop配置,但我无法使用Spark程序读取嵌套子目录(请参阅)。 我将错误粘贴到下面

感谢您的帮助

版本: Spark 2.2.0

输入目录布局:

/user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=1502939225073/part-00000-3a44cd00-e895-4a01-9ab9-946064b739d4-c000.parquet
/user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=1502939234036/part-00000-cbd47353-0590-4cc1-b10d-c18886df1c25-c000.parquet
//Spark Session
val sparkSession: SparkSession=SparkSession.builder().master("yarn").getOrCreate()

//Recursive glob support
val conf= new SparkConf()
val cliRecursiveGlobConf=conf.get("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive")
import sparkSession.implicits._
 sparkSession.sparkContext.hadoopConfiguration.set("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive", cliRecursiveGlobConf)

传递的输入目录参数:

/user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=1502939225073/part-00000-3a44cd00-e895-4a01-9ab9-946064b739d4-c000.parquet
/user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=1502939234036/part-00000-cbd47353-0590-4cc1-b10d-c18886df1c25-c000.parquet
//Spark Session
val sparkSession: SparkSession=SparkSession.builder().master("yarn").getOrCreate()

//Recursive glob support
val conf= new SparkConf()
val cliRecursiveGlobConf=conf.get("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive")
import sparkSession.implicits._
 sparkSession.sparkContext.hadoopConfiguration.set("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive", cliRecursiveGlobConf)
/用户/akhanolk/data/myq/parsed/myq应用程序日志/待压缩/平面视图格式/*/*

尝试(1):

/user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=1502939225073/part-00000-3a44cd00-e895-4a01-9ab9-946064b739d4-c000.parquet
/user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=1502939234036/part-00000-cbd47353-0590-4cc1-b10d-c18886df1c25-c000.parquet
//Spark Session
val sparkSession: SparkSession=SparkSession.builder().master("yarn").getOrCreate()

//Recursive glob support
val conf= new SparkConf()
val cliRecursiveGlobConf=conf.get("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive")
import sparkSession.implicits._
 sparkSession.sparkContext.hadoopConfiguration.set("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive", cliRecursiveGlobConf)
在代码中设置参数

val sparkSession: SparkSession =SparkSession.builder().master("yarn").getOrCreate()

//Recursive glob support & loglevel
import sparkSession.implicits._sparkSession.sparkContext.hadoopConfiguration.setBoolean("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive", true)
未在Spark UI中看到适当的配置

尝试(2):

/user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=1502939225073/part-00000-3a44cd00-e895-4a01-9ab9-946064b739d4-c000.parquet
/user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=1502939234036/part-00000-cbd47353-0590-4cc1-b10d-c18886df1c25-c000.parquet
//Spark Session
val sparkSession: SparkSession=SparkSession.builder().master("yarn").getOrCreate()

//Recursive glob support
val conf= new SparkConf()
val cliRecursiveGlobConf=conf.get("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive")
import sparkSession.implicits._
 sparkSession.sparkContext.hadoopConfiguration.set("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive", cliRecursiveGlobConf)
通过CLI-spark submit传递配置,并在代码中进行设置(见下文)

我确实在Spark UI中看到了配置,但同样的错误-无法遍历到目录结构中

代码:

/user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=1502939225073/part-00000-3a44cd00-e895-4a01-9ab9-946064b739d4-c000.parquet
/user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=1502939234036/part-00000-cbd47353-0590-4cc1-b10d-c18886df1c25-c000.parquet
//Spark Session
val sparkSession: SparkSession=SparkSession.builder().master("yarn").getOrCreate()

//Recursive glob support
val conf= new SparkConf()
val cliRecursiveGlobConf=conf.get("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive")
import sparkSession.implicits._
 sparkSession.sparkContext.hadoopConfiguration.set("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive", cliRecursiveGlobConf)
错误和总体输出:

/user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=1502939225073/part-00000-3a44cd00-e895-4a01-9ab9-946064b739d4-c000.parquet
/user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=1502939234036/part-00000-cbd47353-0590-4cc1-b10d-c18886df1c25-c000.parquet
//Spark Session
val sparkSession: SparkSession=SparkSession.builder().master("yarn").getOrCreate()

//Recursive glob support
val conf= new SparkConf()
val cliRecursiveGlobConf=conf.get("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive")
import sparkSession.implicits._
 sparkSession.sparkContext.hadoopConfiguration.set("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive", cliRecursiveGlobConf)
完全错误位于-


你试过这个
/user/akhanolk/data/myq/parsed/myq-app-logs/to-compressed/flat-view-format/batch\u-id=*/
?@Falan。是-相同错误。然后尝试检查文件是否确实存在@法兰,你已经做了基本的检查,然后才发布了这个问题Spark本身是否通过使用
partitionBy
编写来创建目录结构?如果是,我想您可以尝试使用
spark.read.parquet(“/user/akhanolk/data/myq/parsed/myq-app-logs/”进行压缩/‌​平面视图格式/*”
。让我知道这是否有效。你尝试过这个
/user/akhanolk/data/myq/parsed/myq-app-logs/to-compressed/flat-view-format/batch_-id=*/
?@Falan。是-相同错误。然后尝试检查文件是否确实存在@法兰,你已经做了基本的检查,然后才发布了这个问题Spark本身是否通过使用
partitionBy
编写来创建目录结构?如果是,我想您可以尝试使用
spark.read.parquet(“/user/akhanolk/data/myq/parsed/myq-app-logs/”进行压缩/‌​平面视图格式/*”
。让我知道这是否有效。