Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 读取S3上的拼花文件时SparkSession中出现NullPointerException_Scala_Apache Spark_Amazon S3_Parquet - Fatal编程技术网

Scala 读取S3上的拼花文件时SparkSession中出现NullPointerException

Scala 读取S3上的拼花文件时SparkSession中出现NullPointerException,scala,apache-spark,amazon-s3,parquet,Scala,Apache Spark,Amazon S3,Parquet,当我试图从Spark/Scala中S3上的路径读取拼花地板文件时,我总是遇到以下异常 java.lang.NullPointerException at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:144) at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:142)

当我试图从Spark/Scala中S3上的路径读取拼花地板文件时,我总是遇到以下异常

java.lang.NullPointerException
        at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:144)
        at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:142)
        at org.apache.spark.sql.DataFrameReader.<init>(DataFrameReader.scala:789)
        at org.apache.spark.sql.SparkSession.read(SparkSession.scala:656)
我创建了SparkSession,如下所示:

val sparkConf = new SparkConf().setAppName("My spark app")
val spark = SparkSession.builder.config(sparkConf).enableHiveSupport().getOrCreate()
spark.sparkContext.setLogLevel("WARN")
spark.sparkContext.hadoopConfiguration.set("java.library.path", "/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native")
spark.conf.set("spark.sql.parquet.mergeSchema", "true")
spark.conf.set("spark.speculation", "false")
spark.conf.set("spark.sql.crossJoin.enabled", "true")
spark.conf.set("spark.sql.sources.partitionColumnTypeInference.enabled", "true")
spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")
spark.sparkContext.hadoopConfiguration.set("mapreduce.fileoutputcommitter.algorithm.version", "2")
spark.sparkContext.hadoopConfiguration.setBoolean("mapreduce.fileoutputcommitter.cleanup.skipped", true)
spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", System.getenv("AWS_ACCESS_KEY_ID"))
spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", System.getenv("AWS_SECRET_ACCESS_KEY"))
spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", "s3.amazonaws.com")
下面是我得到此异常的一行:

val df1 = spark.read.parquet(pathToRead)

我做错了什么?我在没有设置“访问密钥”和“密钥”的情况下进行了尝试,但没有成功。

如果将路径更改为
s3a://my path/
val df1 = spark.read.parquet(pathToRead)