Scala 读取S3上的拼花文件时SparkSession中出现NullPointerException
当我试图从Spark/Scala中S3上的路径读取拼花地板文件时,我总是遇到以下异常Scala 读取S3上的拼花文件时SparkSession中出现NullPointerException,scala,apache-spark,amazon-s3,parquet,Scala,Apache Spark,Amazon S3,Parquet,当我试图从Spark/Scala中S3上的路径读取拼花地板文件时,我总是遇到以下异常 java.lang.NullPointerException at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:144) at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:142)
java.lang.NullPointerException
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:144)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:142)
at org.apache.spark.sql.DataFrameReader.<init>(DataFrameReader.scala:789)
at org.apache.spark.sql.SparkSession.read(SparkSession.scala:656)
我创建了SparkSession,如下所示:
val sparkConf = new SparkConf().setAppName("My spark app")
val spark = SparkSession.builder.config(sparkConf).enableHiveSupport().getOrCreate()
spark.sparkContext.setLogLevel("WARN")
spark.sparkContext.hadoopConfiguration.set("java.library.path", "/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native")
spark.conf.set("spark.sql.parquet.mergeSchema", "true")
spark.conf.set("spark.speculation", "false")
spark.conf.set("spark.sql.crossJoin.enabled", "true")
spark.conf.set("spark.sql.sources.partitionColumnTypeInference.enabled", "true")
spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")
spark.sparkContext.hadoopConfiguration.set("mapreduce.fileoutputcommitter.algorithm.version", "2")
spark.sparkContext.hadoopConfiguration.setBoolean("mapreduce.fileoutputcommitter.cleanup.skipped", true)
spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", System.getenv("AWS_ACCESS_KEY_ID"))
spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", System.getenv("AWS_SECRET_ACCESS_KEY"))
spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", "s3.amazonaws.com")
下面是我得到此异常的一行:
val df1 = spark.read.parquet(pathToRead)
我做错了什么?我在没有设置“访问密钥”和“密钥”的情况下进行了尝试,但没有成功。如果将路径更改为
s3a://my path/
val df1 = spark.read.parquet(pathToRead)