Amazon web services spark-redshift-s3:类路径冲突

Amazon web services spark-redshift-s3:类路径冲突,amazon-web-services,apache-spark,amazon-s3,amazon-redshift,databricks,Amazon Web Services,Apache Spark,Amazon S3,Amazon Redshift,Databricks,我正试图使用hadoop 2.7.2和allxuio从AWS上的spark 2.1.0独立集群连接到redshift,这给了我以下错误:线程“main”中出现异常java.lang.NoSuchMethodError:com.amazonaws.services.s3.transfer.TransferManager亚马逊倾向于以足够快的速度更改其库的API,以至于hadoop-aws.jar的所有版本都需要与aws SDK同步;对于Hadoop2.7.x,这是SDK的v1.7.4。就目前而言,

我正试图使用hadoop 2.7.2和allxuio从AWS上的spark 2.1.0独立集群连接到redshift,这给了我以下错误:
线程“main”中出现异常java.lang.NoSuchMethodError:com.amazonaws.services.s3.transfer.TransferManager亚马逊倾向于以足够快的速度更改其库的API,以至于hadoop-aws.jar的所有版本都需要与aws SDK同步;对于Hadoop2.7.x,这是SDK的v1.7.4。就目前而言,您可能不会让红移和s3a共存,尽管您可以继续使用较旧的s3n URL

更新的SDK只有在Hadoop>2.8中才会出现,当它升级到1.11.45时。为什么这么晚?因为这会迫使对Jackson进行更新,最终会破坏下游的一切


欢迎来到可传递依赖性的JAR地狱世界,让我们都希望Java 9能够解决这个问题——尽管它需要有人(你?)来添加所有相关的模块声明

thx@Steve Loughran,它确实是JAR地狱,我想我宁愿用mogo或cassandra来代替redshift以避免喧嚣hadoop的3.0测试版即将推出,它使用了AWS SDK的阴影版本(1.11.199);接下来是一个更易于使用的Hadoop2.9Alpha。因为我们已经切换到阴影部分,AWS选择的jackson版本不会引起太多问题,但是它的API仍然是一个移动的目标。
libraryDependencies += "com.fasterxml.jackson.module" % "jackson-module-scala_2.11" % "2.8.4" 
libraryDependencies += "com.amazonaws" % "aws-java-sdk-core" % "1.11.79"
libraryDependencies += "com.amazonaws" % "aws-java-sdk-s3" % "1.11.79"
libraryDependencies += "org.apache.avro" % "avro-mapred" % "1.8.1"
libraryDependencies += "com.amazonaws" % "aws-java-sdk-redshift" % "1.11.78"
libraryDependencies += "com.databricks" % "spark-redshift_2.11" % "3.0.0-preview1"
libraryDependencies += "org.alluxio" % "alluxio-core-client" % "1.3.0"
libraryDependencies += "com.taxis99" %% "awsscala" % "0.7.3"
libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "2.7.3"
libraryDependencies += "org.apache.spark" %% "spark-core" % sparkVersion
libraryDependencies +=  "org.apache.spark" %% "spark-sql" % sparkVersion
libraryDependencies +=  "org.apache.spark" %% "spark-mllib" % sparkVersion
   val df = spark.read.jdbc(url_read,"public.test", prop).as[Schema.Message.Raw]
  .filter("message != ''")
  .filter("from_id >= 0")
  .limit(100)


df.write
  .format("com.databricks.spark.redshift")
  .option("url", "jdbc:redshift://test.XXX.redshift.amazonaws.com:5439/test?user=test&password=testXXXXX")
  .option("dbtable", "table_test")
  .option("tempdir", "s3a://redshift_logs/")
  .option("forward_spark_s3_credentials", "true")
  .option("tempformat", "CSV")
  .option("jdbcdriver", "com.amazon.redshift.jdbc42.Driver")
  .mode(SaveMode.Overwrite)
  .save()