Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 无法使用spark 2.4.4从google Dataproc访问AWS bucket_Scala_Apache Spark_Amazon S3_Sbt - Fatal编程技术网

Scala 无法使用spark 2.4.4从google Dataproc访问AWS bucket

Scala 无法使用spark 2.4.4从google Dataproc访问AWS bucket,scala,apache-spark,amazon-s3,sbt,Scala,Apache Spark,Amazon S3,Sbt,我正在使用以下代码访问存储在S3上的一些文件: val spark = SparkSession.builder() .enableHiveSupport() .config("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem") .config("fs.s3n.awsAccessKeyId", <Access_key>) .config("fs.s3n.aw

我正在使用以下代码访问存储在S3上的一些文件:

val spark = SparkSession.builder()
      .enableHiveSupport()
      .config("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
      .config("fs.s3n.awsAccessKeyId", <Access_key>)
      .config("fs.s3n.awsSecretAccessKey", <SecretAccessKey>)
      .getOrCreate()


val df = spark.read.orc(<s3 bucket name>)
我正在使用以下命令运行spark jar:

gcloud dataproc jobs submit spark \
    --project <project_name> \
    --region <region> \
    --cluster <cluster name> \
    --class <main class> \
    --properties spark.jars.packages='net.java.dev.jets3t:jets3t:0.9.4' \
    --jars gs://<bucket_name>/jars/sample_s3-assembly-0.1.jar,gs://<bucket_name>/jars/jets3t-0.9.4.jar
我还在sbt中提供了jets3t、hadoop aws和hadoop client的条目,正如另一个线程所说的,但我仍然得到了上述错误

gcloud dataproc jobs submit spark \
    --project <project_name> \
    --region <region> \
    --cluster <cluster name> \
    --class <main class> \
    --properties spark.jars.packages='net.java.dev.jets3t:jets3t:0.9.4' \
    --jars gs://<bucket_name>/jars/sample_s3-assembly-0.1.jar,gs://<bucket_name>/jars/jets3t-0.9.4.jar
name := "sample_s3_ht"

version := "0.1"

scalaVersion := "2.11.12"


resolvers += Opts.resolver.sonatypeReleases
libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "2.7.1"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.0"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.4.0"
libraryDependencies += "com.google.cloud" % "google-cloud-bigquery" % "1.80.0"
libraryDependencies += "com.google.cloud" % "google-cloud-storage" % "1.98.0"
libraryDependencies += "com.google.cloud.spark" %% "spark-bigquery" % "0.7.0-beta"
libraryDependencies += "com.google.cloud.bigdataoss" % "gcs-connector" % "1.6.1-hadoop2"
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.6.0"
libraryDependencies += "net.java.dev.jets3t" % "jets3t" % "0.9.4"


assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs @ _*) => MergeStrategy.discard
  case x => MergeStrategy.first
}