Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/sql-server-2008/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 为什么我不能在Spark应用程序中读取AWS S3?_Java_Amazon S3_Apache Spark - Fatal编程技术网

Java 为什么我不能在Spark应用程序中读取AWS S3?

Java 为什么我不能在Spark应用程序中读取AWS S3?,java,amazon-s3,apache-spark,Java,Amazon S3,Apache Spark,我已经升级到ApacheSpark 1.5.1,但我不确定这是否是原因。我在spark submit中有我的访问密钥,它一直有效 Exception in thread "main" java.lang.NoSuchMethodError: org.jets3t.service.impl.rest.httpclient.RestS3Service.<init>(Lorg/jets3t/service/security/AWSCredentials;)V SQLContext

我已经升级到ApacheSpark 1.5.1,但我不确定这是否是原因。我在spark submit中有我的访问密钥,它一直有效

Exception in thread "main" java.lang.NoSuchMethodError: org.jets3t.service.impl.rest.httpclient.RestS3Service.<init>(Lorg/jets3t/service/security/AWSCredentials;)V

    SQLContext sqlContext = new SQLContext(sc);
    DataFrame df = sqlContext.read()
        .format("com.databricks.spark.csv")
        .option("inferSchema", "true")
        .load("s3n://ossem-replication/gdelt_data/event_data/" + args[0]);

    df.write()
        .format("com.databricks.spark.csv")
        .save("/user/spark/ossem_data/gdelt/" + args[0]);
线程“main”java.lang.NoSuchMethodError中出现异常:org.jets3t.service.impl.rest.httpclient.RestS3Service。(Lorg/jets3t/service/security/AWSCredentials;)V SQLContext SQLContext=新的SQLContext(sc); DataFrame df=sqlContext.read() .format(“com.databricks.spark.csv”) .选项(“推断模式”、“真”) .load(“s3n://ossem replication/gdelt_data/event_data/”+args[0]); df.write() .format(“com.databricks.spark.csv”) .save(“/user/spark/ossem_data/gdelt/”+args[0]); 下面是更多的错误。有一个类不包含该方法,这意味着依赖项不匹配。jets3t似乎不包含RestS3Service方法(Lorg/jets3t/service/security/AWSCredentials;)V有人能解释一下吗

Exception in thread "main" java.lang.NoSuchMethodError: org.jets3t.service.impl.rest.httpclient.RestS3Service.<init>(Lorg/jets3t/service/security/AWSCredentials;)V
    at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:60)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at org.apache.hadoop.fs.s3native.$Proxy24.initialize(Unknown Source)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:272)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
    at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
    at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1277)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
    at org.apache.spark.rdd.RDD.take(RDD.scala:1272)
    at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1312)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
    at org.apache.spark.rdd.RDD.first(RDD.scala:1311)
    at com.databricks.spark.csv.CsvRelation.firstLine$lzycompute(CsvRelation.scala:101)
    at com.databricks.spark.csv.CsvRelation.firstLine(CsvRelation.scala:99)
    at com.databricks.spark.csv.CsvRelation.inferSchema(CsvRelation.scala:82)
    at com.databricks.spark.csv.CsvRelation.<init>(CsvRelation.scala:42)
    at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:74)
    at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:39)
    at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:27)
    at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104)
    at com.bah.ossem.spark.GdeltSpark.main(GdeltSpark.java:20)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
线程“main”java.lang.NoSuchMethodError中出现异常:org.jets3t.service.impl.rest.httpclient.RestS3Service。(Lorg/jets3t/service/security/AWSCredentials;)V 位于org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:60) 在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)处 位于sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)中 位于java.lang.reflect.Method.invoke(Method.java:497) 位于org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) 位于org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) 位于org.apache.hadoop.fs.s3native.$Proxy24.initialize(未知源) 位于org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:272) 位于org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596) 位于org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) 位于org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) 位于org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) 位于org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) 位于org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) 位于org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256) 位于org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) 位于org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) 位于org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) 位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:239) 位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:237) 在scala.Option.getOrElse(Option.scala:120) 位于org.apache.spark.rdd.rdd.partitions(rdd.scala:237) 位于org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) 位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:239) 位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:237) 在scala.Option.getOrElse(Option.scala:120) 位于org.apache.spark.rdd.rdd.partitions(rdd.scala:237) 在org.apache.spark.rdd.rdd$$anonfun$take$1.apply上(rdd.scala:1277) 位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) 位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) 位于org.apache.spark.rdd.rdd.withScope(rdd.scala:306) 位于org.apache.spark.rdd.rdd.take(rdd.scala:1272) 位于org.apache.spark.rdd.rdd$$anonfun$first$1.apply(rdd.scala:1312) 位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) 位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) 位于org.apache.spark.rdd.rdd.withScope(rdd.scala:306) 位于org.apache.spark.rdd.rdd.first(rdd.scala:1311) 在com.databricks.spark.csv.CsvRelation.firstLine$lzycompute上(CsvRelation.scala:101) 位于com.databricks.spark.csv.CsvRelation.firstLine(CsvRelation.scala:99) 在com.databricks.spark.csv.CsvRelation.inferSchema上(CsvRelation.scala:82) 请访问com.databricks.spark.csv.CsvRelation.(CsvRelation.scala:42) 在com.databricks.spark.csv.DefaultSource.createRelation上(DefaultSource.scala:74) 位于com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:39) 在com.databricks.spark.csv.DefaultSource.createRelation上(DefaultSource.scala:27) 位于org.apache.spark.sql.execution.datasources.resolvedatasource$.apply(resolvedatasource.scala:125) 位于org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114) 位于org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104) 位于com.bah.ossem.spark.GdeltSpark.main(GdeltSpark.java:20) 在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)处 位于sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)中 位于java.lang.reflect.Method.invoke(Method.java:497)
我也有同样的问题,但Spark 1.6和我使用的是Scala而不是Java。出现此错误的原因是Spark Core的Hadoop客户端版本为2.2,而我使用的Spark cluster安装版本为1.6。为了让它工作,我必须做以下更改

  • 将hadoop客户端依赖项更改为2.6(我使用的hadoop版本)

  • 在我的Spark fat jar中包含hadoop aws库,因为1.6中的hadoop库中不再包含此依赖项

    "org.apache.hadoop" % "hadoop-aws" % "2.6.0",
    
  • 将AWS密钥和密码导出为环境变量

  • 在SparkConf中指定以下Hadoop配置

    val sparkContext = new SparkContext(sparkConf)
    val hadoopConf = sparkContext.hadoopConfiguration
    hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
    hadoopConf.set("fs.s3.awsAccessKeyId", sys.env.getOrElse("AWS_ACCESS_KEY_ID", ""))
    hadoopConf.set("fs.s3.awsSecretAccessKey", sys.env.getOrElse("AWS_SECRET_ACCESS_KEY", ""))
    

  • 这听起来像是在升级时,API中发生了一些更改,导致代码无效。你读过吗
    val sparkContext = new SparkContext(sparkConf)
    val hadoopConf = sparkContext.hadoopConfiguration
    hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
    hadoopConf.set("fs.s3.awsAccessKeyId", sys.env.getOrElse("AWS_ACCESS_KEY_ID", ""))
    hadoopConf.set("fs.s3.awsSecretAccessKey", sys.env.getOrElse("AWS_SECRET_ACCESS_KEY", ""))