Apache spark 是从Spark jar中的源类重新编译而成,打破了sbt';合并?

Apache spark 是从Spark jar中的源类重新编译而成,打破了sbt';合并?,apache-spark,sbt,sbt-assembly,Apache Spark,Sbt,Sbt Assembly,尝试使用sbt创建胖jar会出现如下错误: java.lang.RuntimeException: deduplicate: different file contents found in the following: C:\Users\db\.ivy2\cache\org.apache.spark\spark-network-common_2.10\jars\spark-network-common_2.10-1.6.3.jar:com/google/common/base/Function

尝试使用sbt创建胖jar会出现如下错误:

java.lang.RuntimeException: deduplicate: different file contents found in the following:
C:\Users\db\.ivy2\cache\org.apache.spark\spark-network-common_2.10\jars\spark-network-common_2.10-1.6.3.jar:com/google/common/base/Function.class
C:\Users\db\.ivy2\cache\com.google.guava\guava\bundles\guava-14.0.1.jar:com/google/common/base/Function.class
有很多类,这只是一个例子。Guava 14.0.1是Function.class在两个罐子中的版本:

[info]  +-com.google.guava:guava:14.0.1
...
[info]  | | +-com.google.guava:guava:14.0.1
这意味着sbt/ivy不会选择一个作为较新版本,但罐子的大小和日期不同,这可能导致上述错误:

$ jar tvf /c/Users/db/.ivy2/cache/org.apache.spark/spark-network-common_2.10/jars/spark-network-common_2.10-1.6.3.jar | grep "com/google/common/base/Function.class"
   549 Wed Nov 02 16:03:20 CDT 2016 com/google/common/base/Function.class

$ jar tvf /c/Users/db/.ivy2/cache/com.google.guava/guava/bundles/guava-14.0.1.jar  | grep "com/google/common/base/Function.class"
   543 Thu Mar 14 19:56:52 CDT 2013 com/google/common/base/Function.class
Apache似乎正在从源代码处重新编译
Function.class
,而不是像最初编译的那样包含该类。 这是对这里发生的事情的正确理解吗?现在,可以使用sbt排除重新编译的类,但是有吗 一种构建jar而不显式排除每个包含重新编译的源代码的jar的方法?排除罐子会明确地导致某些事情 根据下面的片段,我似乎走错了路:

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.3"
  excludeAll(
    ExclusionRule(organization = "com.twitter"),
    ExclusionRule(organization = "org.apache.spark", name = "spark-network-common_2.10"),
    ExclusionRule(organization = "org.apache.hadoop", name = "hadoop-client"),
    ExclusionRule(organization = "org.apache.hadoop", name = "hadoop-hdfs"),
    ExclusionRule(organization = "org.tachyonproject", name = "tachyon-client"),
    ExclusionRule(organization = "commons-beanutils", name = "commons-beanutils"),
    ExclusionRule(organization = "commons-collections", name = "commons-collections"),
    ExclusionRule(organization = "org.apache.hadoop", name = "hadoop-yarn-api"),
    ExclusionRule(organization = "org.apache.hadoop", name = "hadoop-yarn-common"),
    ExclusionRule(organization = "org.apache.curator", name = "curator-recipes")
  )
,
libraryDependencies += "org.apache.spark" %% "spark-network-common" % "1.6.3" exclude("com.google.guava", "guava"),
libraryDependencies += "org.apache.spark" %% "spark-graphx" % "1.6.3",
libraryDependencies += "com.typesafe.scala-logging" %% "scala-logging-slf4j" % "2.1.2",
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.2.0" exclude("com.google.guava", "guava"),
libraryDependencies += "com.google.guava" % "guava" % "14.0.1",
libraryDependencies += "org.json4s" %% "json4s-native" % "3.2.11",
libraryDependencies += "org.json4s" %% "json4s-ext" % "3.2.11",
libraryDependencies += "com.rabbitmq" % "amqp-client" % "4.1.1",
libraryDependencies += "commons-codec" % "commons-codec" % "1.10",
如果这是一条错误的道路,什么是更干净的方法

如果这是一条错误的道路,什么是更干净的方法

更简洁的方法是根本不打包
spark core
,当您在目标机器上安装spark时,它对您可用,并且在运行时对您的应用程序可用(您通常可以在
/usr/lib/spark/jars
下找到它们)


您应该将这些spark相关性标记为提供的
%。这应该可以帮助您避免打包这些罐子所造成的许多冲突。

这就是我为在Spark下运行的应用程序在云中所做的。一些相同的代码被用作GUI的一部分。集装箱没有提供任何东西。“也许这是为GUI构建jar的更好方法?”DonBranson我明白了。我绝对不建议将Spark作为任何uber罐的一部分打包,因为Spark非常庞大,它会让你陷入依赖地狱。相反,我会确保容器中有一些配置管理人员提供的依赖项,这些配置管理人员负责将Spark jar解包到容器中。当然,这是一个热依赖项混乱。没有用于GUI的容器。我需要建立一个独立的,我可以与非开发人员共享。有办法吗?也许我需要制作一个包含所有jar的zip,以及一个脚本来将它们全部添加到一个类路径中。谢谢,尝试一下。这确实有效。最后我有50多个罐子要放进去,这可不太好。但这是第一步。第二步,让它变得美丽