Apache spark 如何在Spark 3.0预览版中使用Delta?

Apache spark 如何在Spark 3.0预览版中使用Delta?,apache-spark,delta-lake,Apache Spark,Delta Lake,SPARK 3.0无法将DF另存为HDFS中的增量表 Scala版本2.12.10 Spark 3.0版预览版 可以在2.4.4中实现,但未创建分区 输入样本: Vehicle_id|model|brand|year|miles|intake_date_time v0001H|verna|Hyundai|2011|5000|2018-01-20 06:30:00 v0001F|Eco-sport|Ford|2013|4000|2018-02-10 06:30:00 v0002F|End

SPARK 3.0无法将DF另存为HDFS中的增量表

  • Scala版本2.12.10
  • Spark 3.0版预览版
可以在2.4.4中实现,但未创建分区

输入样本:

Vehicle_id|model|brand|year|miles|intake_date_time

v0001H|verna|Hyundai|2011|5000|2018-01-20 06:30:00

v0001F|Eco-sport|Ford|2013|4000|2018-02-10 06:30:00

v0002F|Endeavour|Ford|2011|8000|2018-04-12 06:30:00

v0001L|Gallardo|Lambhorghini|2013|2000|2018-05-16 06:30:00
错误:

com.google.common.util.concurrent.ExecutionError:java.lang.NoSuchMethodError:org.apache.spark.util.Utils$.classForName(Ljava/lang/String;)Ljava/lang/Class; 位于com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2261) 位于com.google.common.cache.LocalCache.get(LocalCache.java:4000) 位于com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789) 位于org.apache.spark.sql.delta.DeltaLog$.apply(DeltaLog.scala:714) 位于org.apache.spark.sql.delta.DeltaLog$.forTable(DeltaLog.scala:676) 位于org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:124) 位于org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46) 位于org.apache.spark.sql.execution.command.executeCommandExec.sideEffectResult$lzycompute(commands.scala:71) 位于org.apache.spark.sql.execution.command.executeCommandExec.sideEffectResult(commands.scala:69) 位于org.apache.spark.sql.execution.command.executeCommandExec.doExecute(commands.scala:87) 位于org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:189) 位于org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:227) 位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 位于org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:224) 位于org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:185) 位于org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:110) 位于org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:109) 位于org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:829) 位于org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$4(SQLExecution.scala:100) 在org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160) 位于org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87) 位于org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:829) 位于org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:309) 位于org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293) 位于org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:236) ... 47删去 原因:java.lang.NoSuchMethodError:org.apache.spark.util.Utils$.classForName(Ljava/lang/String;)Ljava/lang/Class; 位于org.apache.spark.sql.delta.storage.LogStoreProvider.createLogStore(LogStore.scala:122) 位于org.apache.spark.sql.delta.storage.LogStoreProvider.createLogStore$(LogStore.scala:120) 位于org.apache.spark.sql.delta.DeltaLog.createLogStore(DeltaLog.scala:58) 位于org.apache.spark.sql.delta.storage.LogStoreProvider.createLogStore(LogStore.scala:117) 位于org.apache.spark.sql.delta.storage.LogStoreProvider.createLogStore$(LogStore.scala:115) 位于org.apache.spark.sql.delta.DeltaLog.createLogStore(DeltaLog.scala:58) 位于org.apache.spark.sql.delta.DeltaLog(DeltaLog.scala:79) 在org.apache.spark.sql.delta.DeltaLog$$anon$3.$anonfun$call$2(DeltaLog.scala:718) 位于org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194) 在org.apache.spark.sql.delta.DeltaLog$$anon$3.$anonfun$call$1(DeltaLog.scala:718) 在com.databricks.spark.util.databricksloging.recordOperation上(databricksloging.scala:77) 位于com.databricks.spark.util.databricksloging.recordOperation$(databricksloging.scala:67) 位于org.apache.spark.sql.delta.DeltaLog$.recordOperation(DeltaLog.scala:645) 位于org.apache.spark.sql.delta.metering.deltaloging.recordDeltaOperation(deltaloging.scala:103) 位于org.apache.spark.sql.delta.metering.deltaloging.recordDeltaOperation$(deltaloging.scala:89) 位于org.apache.spark.sql.delta.DeltaLog$.recordDeltaOperation(DeltaLog.scala:645) 位于org.apache.spark.sql.delta.DeltaLog$$anon$3.call(DeltaLog.scala:717) 位于org.apache.spark.sql.delta.DeltaLog$$anon$3.call(DeltaLog.scala:714) 位于com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792) 位于com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) 位于com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) 位于com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) 位于com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257) ... 71多

在REPL的spark2.4.4中,它是在没有分区的情况下编写的

在以下位置发现Spark 3.0错误

Spark 3.0与Spark 2.4有很大不同,因此无法工作

但是有一个分支吗


你是如何执行代码的?通过
spark提交
或通过
java-jar
?JAVA_HOME的价值是什么?delta lake的版本是什么?这是0.4.0还是主版本?spark shell,运行在spark shell编译组:“io.delta”,名称:“delta-core_2.12”,版本:“0.4.0”-delta版本由于spark 3.0尚未发布,无论delta在任何3.0预览中发生什么,都只能被视为一个bug,不值得对StackOverflow提出任何问题。关闭它。支持Spark 3.0的delta-core尚未部署到Maven Repo?是的。就在今天。但在这里找不到:。我正在努力
// reading 
val deltaTableInput1 = spark.read
                            .format("com.databricks.spark.csv")
                            .option("header","true")
                            .option("delimiter","|")
                            .option("inferSchema","true")
                            .load("file")
                            .selectExpr("Vehicle_id","model","brand","year","month","miles","CAST(concat(substring(intake_date_time,7,4),concat(substring(intake_date_time,3,4),concat(substring(intake_date_time,1,2),substring(intake_date_time,11,9)))) AS TIMESTAMP) as intake_date_time")  

// Writing
 deltaTableInput1.write
                 .mode("overwrite")
                 .partitionBy("brand","model","year","month")
                 .format("delta")
                 .save("path")