Scala build.sbt中的重复数据消除

Scala build.sbt中的重复数据消除,scala,sbt,Scala,Sbt,includes路径中存在冲突的jline jar文件,需要进行重复数据消除。我已尝试这样做如下: 我正在构建一个组合的spark和kafka胖jar,jline jar文件是重复的 这是build.sbt文件 import sbt._ import sbt.Keys._ import java.io.File import AssemblyKeys._ name := "kafkascala" version := "0.1.0-SNAPSHOT" scalaVersion :

includes路径中存在冲突的jline jar文件,需要进行重复数据消除。我已尝试这样做如下:

我正在构建一个组合的spark和kafka胖jar,jline jar文件是重复的

这是build.sbt文件

import sbt._
import sbt.Keys._
import java.io.File
import AssemblyKeys._

name := "kafkascala"

version      := "0.1.0-SNAPSHOT"

scalaVersion := "2.10.4"

libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.10" % "0.9.1",
 "org.apache.spark" % "spark-examples_2.10" % "0.9.1",
 "org.apache.spark" % "spark-tools_2.10" % "0.9.1",
 "org.apache.kafka" % "kafka_2.10" % "0.8.1.1"   intransitive()
withSources
)

resolvers += "Apache repo" at "https://repository.apache.org/content/repositories/releases"

assemblySettings

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => {
    case PathList("maven","jline","jline","pom.properties") => MergeStrategy.discard
    case x => old(x)
  }
}
以下是build.sbt中试图解决此问题的部分:

assemblySettings

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => {
    case PathList("maven","jline","jline","pom.properties") => MergeStrategy.discard
    case x => old(x)
  }
}
assemblySettings
程序集mergeStrategy.discard中的mergeStrategy
案例x=>旧(x)
}
}

您的
路径列表
错误。您在
“maven”
前面缺少了
“META-INF”
。像这样的方法应该会奏效:

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => {
    case PathList("META-INF", "maven","jline","jline", "pom.properties" ) => MergeStrategy.discard
    case PathList("META-INF", "maven","jline","jline", "pom.xml" ) => MergeStrategy.discard
    case x => old(x)
  } 
}
程序集mergeStrategy.discard中的
mergeStrategy
案例路径列表(“META-INF”、“maven”、“jline”、“jline”、“pom.xml”)=>MergeStrategy.discard
案例x=>旧(x)
} 
}
或者更简洁一点(在一行中排除pom.properties和pom.xml):

程序集mergeStrategy.discard中的
mergeStrategy
案例x=>旧(x)
} 
}

我得出结论,sbt中的
defaultMergeStrategy
不必要地导致重复数据消除错误。根据sbt assembly作者的说法,之所以选择失败而不是先选择,是因为他们认为用户应该明确地做出决定,但大多数用户甚至不知道错误的含义,他们花了很长时间才明白,将重复数据消除更改为先执行只会让工作顺利进行。我认为使构建失败是一个错误,以下是
defaultMergeStrategy
的外观:

// Strat copied from defaultMergeStrategy with the 
// "fail and confuse the hell out the user" lines changed to
// "just bloody work and stop pissing everyone off"
mergeStrategy in assembly <<= (mergeStrategy in assembly) ((old) => {
  case x if Assembly.isConfigFile(x) =>
    MergeStrategy.concat
  case PathList(ps @ _*) if Assembly.isReadme(ps.last) || Assembly.isLicenseFile(ps.last) =>
    MergeStrategy.rename
  case PathList("META-INF", xs @ _*) =>
    (xs map {_.toLowerCase}) match {
      case ("manifest.mf" :: Nil) | ("index.list" :: Nil) | ("dependencies" :: Nil) =>
        MergeStrategy.discard
      case ps @ (x :: xs) if ps.last.endsWith(".sf") || ps.last.endsWith(".dsa") =>
        MergeStrategy.discard
      case "plexus" :: xs =>
        MergeStrategy.discard
      case "services" :: xs =>
        MergeStrategy.filterDistinctLines
      case ("spring.schemas" :: Nil) | ("spring.handlers" :: Nil) =>
        MergeStrategy.filterDistinctLines
      case _ => MergeStrategy.first // Changed deduplicate to first
    }
  case PathList(_*) => MergeStrategy.first // added this line
})
//Strat是从defaultMergeStrategy复制的,带有
//“失败并把用户搞糊涂”一行改为
//“干得好,别惹大家生气”
装配中的合并策略
MergeStrategy.concat
如果Assembly.isReadme(ps.last)| Assembly.isLicenseFile(ps.last)=>
MergeStrategy.rename
案例路径列表(“META-INF”,xs@*)=>
(xs映射{{toLowerCase})匹配{
大小写(“manifest.mf”::Nil)|(“index.list”::Nil)|(“依赖项”::Nil)=>
合并策略。放弃
如果ps.last.endsWith(“.sf”)| ps.last.endsWith(“.dsa”)=>
合并策略。放弃
案例“丛”::xs=>
合并策略。放弃
案例“服务”::xs=>
MergeStrategy.filterDistinctLines
大小写(“spring.schemas”::Nil)|(“spring.handlers”::Nil)=>
MergeStrategy.filterDistinctLines
案例=>MergeStrategy.first//将重复数据消除更改为first
}
案例路径列表(*)=>MergeStrategy.first//添加了此行
})

太棒了!你说得对。你现在(在接受并投票支持你的答案后)和我有着几乎相同的声誉。我正在经历一系列重复数据消除。有没有什么方法可以一次看到它们,而不是多次运行sbt assembly?如果您只是按照这里的规定排除重复的依赖项,可能会更好,但我认为有一种方法是只使用一个这样的案例
case x=>val oldStrategy=old(x)if(oldStrategy==MergeStrategy.deduplicate)MergeStrategy.discard else oldStrategy
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => {
    case PathList("META-INF", "maven","jline","jline", ps) if ps.startsWith("pom") => MergeStrategy.discard
    case x => old(x)
  } 
}
// Strat copied from defaultMergeStrategy with the 
// "fail and confuse the hell out the user" lines changed to
// "just bloody work and stop pissing everyone off"
mergeStrategy in assembly <<= (mergeStrategy in assembly) ((old) => {
  case x if Assembly.isConfigFile(x) =>
    MergeStrategy.concat
  case PathList(ps @ _*) if Assembly.isReadme(ps.last) || Assembly.isLicenseFile(ps.last) =>
    MergeStrategy.rename
  case PathList("META-INF", xs @ _*) =>
    (xs map {_.toLowerCase}) match {
      case ("manifest.mf" :: Nil) | ("index.list" :: Nil) | ("dependencies" :: Nil) =>
        MergeStrategy.discard
      case ps @ (x :: xs) if ps.last.endsWith(".sf") || ps.last.endsWith(".dsa") =>
        MergeStrategy.discard
      case "plexus" :: xs =>
        MergeStrategy.discard
      case "services" :: xs =>
        MergeStrategy.filterDistinctLines
      case ("spring.schemas" :: Nil) | ("spring.handlers" :: Nil) =>
        MergeStrategy.filterDistinctLines
      case _ => MergeStrategy.first // Changed deduplicate to first
    }
  case PathList(_*) => MergeStrategy.first // added this line
})