Apache spark spark&x2B;sbt组件:“;重复数据消除:在以下文件中找到不同的文件内容;

Apache spark spark&x2B;sbt组件:“;重复数据消除:在以下文件中找到不同的文件内容;,apache-spark,sbt,sbt-assembly,Apache Spark,Sbt,Sbt Assembly,我运行了spark应用程序,希望将测试类打包到胖罐子中。奇怪的是,我成功地运行了“sbt组装”,但在运行“sbt测试:组装”时失败了 我试过了,但对我的案子无效 SBT版本:0.13.8 build.sbt: import sbtassembly.AssemblyPlugin._ name := "assembly-test" version := "1.0" scalaVersion := "2.10.5" libraryDependencies ++= Seq( ("org.ap

我运行了spark应用程序,希望将测试类打包到胖罐子中。奇怪的是,我成功地运行了“sbt组装”,但在运行“sbt测试:组装”时失败了

我试过了,但对我的案子无效

SBT版本:0.13.8

build.sbt:

import sbtassembly.AssemblyPlugin._

name := "assembly-test"

version := "1.0"

scalaVersion := "2.10.5"

libraryDependencies ++= Seq(
  ("org.apache.spark" % "spark-core_2.10" % "1.3.1" % Provided)
    .exclude("org.mortbay.jetty", "servlet-api").
    exclude("commons-beanutils", "commons-beanutils-core").
    exclude("commons-collections", "commons-collections").
    exclude("commons-logging", "commons-logging").
    exclude("com.esotericsoftware.minlog", "minlog").exclude("com.codahale.metrics", "metrics-core"),
  "org.json4s" % "json4s-jackson_2.10" % "3.2.10" % Provided,
  "com.google.inject" % "guice" % "4.0"
)

Project.inConfig(Test)(assemblySettings)

您必须在汇编中定义合并,就像我在下面的spark应用程序中所做的那样

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
  {
    case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
    case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
    case PathList("org", "apache", xs @ _*) => MergeStrategy.last
    case PathList("com", "google", xs @ _*) => MergeStrategy.last
    case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
    case PathList("com", "codahale", xs @ _*) => MergeStrategy.last
    case PathList("com", "yammer", xs @ _*) => MergeStrategy.last
    case "about.html" => MergeStrategy.rename
    case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last
    case "META-INF/mailcap" => MergeStrategy.last
    case "META-INF/mimetypes.default" => MergeStrategy.last
    case "plugin.properties" => MergeStrategy.last
    case "log4j.properties" => MergeStrategy.last
    case x => old(x)
  }
}
程序集mergeStrategy.last中的
mergeStrategy
案例路径列表(“javax”,“activation”,xs@*)=>MergeStrategy.last
案例路径列表(“org”,“apache”,xs@*)=>MergeStrategy.last
案例路径列表(“com”、“google”、xs@*)=>MergeStrategy.last
案例路径列表(“com”,“深奥软件”,xs@*)=>MergeStrategy.last
案例路径列表(“com”、“codahale”、xs@*)=>MergeStrategy.last
案例路径列表(“com”、“yammer”、xs@*)=>MergeStrategy.last
案例“about.html”=>MergeStrategy.rename
案例“META-INF/ECLIPSEF.RSA”=>MergeStrategy.last
案例“META-INF/mailcap”=>MergeStrategy.last
案例“META-INF/mimetypes.default”=>MergeStrategy.last
案例“plugin.properties”=>MergeStrategy.last
案例“log4j.properties”=>MergeStrategy.last
案例x=>旧(x)
}
}

除了Wesley Milano的答案之外,还需要对代码进行一些修改,以适应sbt汇编插件的较新版本(即0.13.0),以防有人怀疑弃用警告:

assemblyMergeStrategy in assembly := {
    case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
    case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
    case PathList("org", "apache", xs @ _*) => MergeStrategy.last
    case PathList("com", "google", xs @ _*) => MergeStrategy.last
    case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
    case PathList("com", "codahale", xs @ _*) => MergeStrategy.last
    case PathList("com", "yammer", xs @ _*) => MergeStrategy.last
    case "about.html" => MergeStrategy.rename
    case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last
    case "META-INF/mailcap" => MergeStrategy.last
    case "META-INF/mimetypes.default" => MergeStrategy.last
    case "plugin.properties" => MergeStrategy.last
    case "log4j.properties" => MergeStrategy.last
    case x =>
        val oldStrategy = (assemblyMergeStrategy in assembly).value
        oldStrategy(x)
}

将所有这些内容放在sbt文件中,并添加更多的“exclude(…)”子句,可以生成jar,测试类也在jar中,但是我发现“provided”不起作用,只有通过spark submit提交spark应用程序时才需要“provided”。如果你直接运行spark应用程序,不要使用“提供的”。我已经使用Scala一年多了,我不知道这段代码是什么,但重要的是它能工作。Thanks@FelipeAlmeida你似乎在spark很有经验,所以我想知道你是否能帮我一点忙。。。我试图从我的SBT项目中创建一个jar文件来运行它。你知道我怎么做吗?@1290当然知道。我已经写了一篇关于这个的文章:spark 2.x.x需要对这个解决方案稍加修改: