Apache spark 在Spark 2.4(Hdinsight)中使用Delta Lake源时出错

Apache spark 在Spark 2.4(Hdinsight)中使用Delta Lake源时出错,apache-spark,azure-hdinsight,delta-lake,Apache Spark,Azure Hdinsight,Delta Lake,在下面的错误中,相同的代码在Databricks中工作,但在Hdinsight中不工作。我已经在类路径中添加了delta库和hadoopazure库 io.delta:delta-core_2.11:0.5.0,org.apache.hadoop:hadoop-azure:3.1.3 ERROR ApplicationMaster [Driver]: User class threw exception: com.google.common.util.concurrent.ExecutionE

在下面的错误中,相同的代码在Databricks中工作,但在Hdinsight中不工作。我已经在类路径中添加了delta库和hadoopazure库

io.delta:delta-core_2.11:0.5.0,org.apache.hadoop:hadoop-azure:3.1.3

ERROR ApplicationMaster [Driver]: User class threw exception: com.google.common.util.concurrent.ExecutionError: java.lang.NoClassDefFoundError: com/fasterxml/jackson/module/scala/experimental/ScalaObjectMapper$class
com.google.common.util.concurrent.ExecutionError: java.lang.NoClassDefFoundError: com/fasterxml/jackson/module/scala/experimental/ScalaObjectMapper$class
    at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2049)
    at com.google.common.cache.LocalCache.get(LocalCache.java:3953)
    at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4873)
    at org.apache.spark.sql.delta.DeltaLog$.apply(DeltaLog.scala:740)
    at org.apache.spark.sql.delta.DeltaLog$.forTable(DeltaLog.scala:712)
    at org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:169)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
    at io.delta.tables.DeltaTable$.forPath(DeltaTable.scala:635)
    

与HDInsight打包并由spark、deltalake使用的jackson json库版本之间存在冲突

有两种方法可以解决这个问题

  • 将jackson json 2.6.7版本依赖项打包到应用程序中(maven shade插件或scala程序集)
  • 如果您使用的是jupyter笔记本电脑,请设置以下spark配置

  • 您能检查一下类路径中的
    jackson模块scala
    的版本吗?看起来您使用的版本不兼容。我使用的是2.11.1 com.fasterxml.jackson.module jackson-module-scala_2.11 2.11.1 test Spark 2.4.6使用的是2.6.7.1(),最好使用相同的版本
    com/fasterxml/jackson/module/scala/experimental/ScalaObjectMapper
    已不在jackson模块scala 2.11.1中。谢谢!!尝试了同样的问题,但仍然是同样的问题。在spark shell中也会出现同样的错误
    {"conf":
     {"spark.jars.packages": "io.delta:delta-core_2.11:0.5.0", 
        "spark.driver.extraClassPath":
         "${PATH}/jackson-module-scala_2.11-2.6.7.1.jar;${PATH}/jackson-annotations-2.6.7.jar;
          ${PATH}/jackson-core-2.6.7.jar;
          ${PATH}/jackson-databind-2.6.7.1.jar;
          ${PATH}/jackson-module-paranamer-2.6.7.jar",
       "spark.executor.extraClassPath":
         "${PATH}/jackson-module-scala_2.11-2.6.7.1.jar;${PATH}/jackson-annotations-2.6.7.jar;
          ${PATH}/jackson-core-2.6.7.jar;${PATH}/jackson-databind-2.6.7.1.jar;
          ${PATH}/jackson-module-paranamer-2.6.7.jar",
       "spark.driver.userClassPathFirst":true}}