Apache spark 如何在Spark 3.0预览版中使用Delta?
SPARK 3.0无法将DF另存为HDFS中的增量表Apache spark 如何在Spark 3.0预览版中使用Delta?,apache-spark,delta-lake,Apache Spark,Delta Lake,SPARK 3.0无法将DF另存为HDFS中的增量表 Scala版本2.12.10 Spark 3.0版预览版 可以在2.4.4中实现,但未创建分区 输入样本: Vehicle_id|model|brand|year|miles|intake_date_time v0001H|verna|Hyundai|2011|5000|2018-01-20 06:30:00 v0001F|Eco-sport|Ford|2013|4000|2018-02-10 06:30:00 v0002F|End
- Scala版本2.12.10
- Spark 3.0版预览版
Vehicle_id|model|brand|year|miles|intake_date_time
v0001H|verna|Hyundai|2011|5000|2018-01-20 06:30:00
v0001F|Eco-sport|Ford|2013|4000|2018-02-10 06:30:00
v0002F|Endeavour|Ford|2011|8000|2018-04-12 06:30:00
v0001L|Gallardo|Lambhorghini|2013|2000|2018-05-16 06:30:00
错误:
com.google.common.util.concurrent.ExecutionError:java.lang.NoSuchMethodError:org.apache.spark.util.Utils$.classForName(Ljava/lang/String;)Ljava/lang/Class;
位于com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2261)
位于com.google.common.cache.LocalCache.get(LocalCache.java:4000)
位于com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789)
位于org.apache.spark.sql.delta.DeltaLog$.apply(DeltaLog.scala:714)
位于org.apache.spark.sql.delta.DeltaLog$.forTable(DeltaLog.scala:676)
位于org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:124)
位于org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
位于org.apache.spark.sql.execution.command.executeCommandExec.sideEffectResult$lzycompute(commands.scala:71)
位于org.apache.spark.sql.execution.command.executeCommandExec.sideEffectResult(commands.scala:69)
位于org.apache.spark.sql.execution.command.executeCommandExec.doExecute(commands.scala:87)
位于org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:189)
位于org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:227)
位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
位于org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:224)
位于org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:185)
位于org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:110)
位于org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:109)
位于org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:829)
位于org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$4(SQLExecution.scala:100)
在org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
位于org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87)
位于org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:829)
位于org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:309)
位于org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
位于org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:236)
... 47删去
原因:java.lang.NoSuchMethodError:org.apache.spark.util.Utils$.classForName(Ljava/lang/String;)Ljava/lang/Class;
位于org.apache.spark.sql.delta.storage.LogStoreProvider.createLogStore(LogStore.scala:122)
位于org.apache.spark.sql.delta.storage.LogStoreProvider.createLogStore$(LogStore.scala:120)
位于org.apache.spark.sql.delta.DeltaLog.createLogStore(DeltaLog.scala:58)
位于org.apache.spark.sql.delta.storage.LogStoreProvider.createLogStore(LogStore.scala:117)
位于org.apache.spark.sql.delta.storage.LogStoreProvider.createLogStore$(LogStore.scala:115)
位于org.apache.spark.sql.delta.DeltaLog.createLogStore(DeltaLog.scala:58)
位于org.apache.spark.sql.delta.DeltaLog(DeltaLog.scala:79)
在org.apache.spark.sql.delta.DeltaLog$$anon$3.$anonfun$call$2(DeltaLog.scala:718)
位于org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
在org.apache.spark.sql.delta.DeltaLog$$anon$3.$anonfun$call$1(DeltaLog.scala:718)
在com.databricks.spark.util.databricksloging.recordOperation上(databricksloging.scala:77)
位于com.databricks.spark.util.databricksloging.recordOperation$(databricksloging.scala:67)
位于org.apache.spark.sql.delta.DeltaLog$.recordOperation(DeltaLog.scala:645)
位于org.apache.spark.sql.delta.metering.deltaloging.recordDeltaOperation(deltaloging.scala:103)
位于org.apache.spark.sql.delta.metering.deltaloging.recordDeltaOperation$(deltaloging.scala:89)
位于org.apache.spark.sql.delta.DeltaLog$.recordDeltaOperation(DeltaLog.scala:645)
位于org.apache.spark.sql.delta.DeltaLog$$anon$3.call(DeltaLog.scala:717)
位于org.apache.spark.sql.delta.DeltaLog$$anon$3.call(DeltaLog.scala:714)
位于com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792)
位于com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
位于com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
位于com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
位于com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
... 71多
在REPL的spark2.4.4中,它是在没有分区的情况下编写的
在以下位置发现Spark 3.0错误:
Spark 3.0与Spark 2.4有很大不同,因此无法工作
但是有一个分支吗
你是如何执行代码的?通过
spark提交
或通过java-jar
?JAVA_HOME的价值是什么?delta lake的版本是什么?这是0.4.0还是主版本?spark shell,运行在spark shell编译组:“io.delta”,名称:“delta-core_2.12”,版本:“0.4.0”-delta版本由于spark 3.0尚未发布,无论delta在任何3.0预览中发生什么,都只能被视为一个bug,不值得对StackOverflow提出任何问题。关闭它。支持Spark 3.0的delta-core尚未部署到Maven Repo?是的。就在今天。但在这里找不到:。我正在努力
// reading
val deltaTableInput1 = spark.read
.format("com.databricks.spark.csv")
.option("header","true")
.option("delimiter","|")
.option("inferSchema","true")
.load("file")
.selectExpr("Vehicle_id","model","brand","year","month","miles","CAST(concat(substring(intake_date_time,7,4),concat(substring(intake_date_time,3,4),concat(substring(intake_date_time,1,2),substring(intake_date_time,11,9)))) AS TIMESTAMP) as intake_date_time")
// Writing
deltaTableInput1.write
.mode("overwrite")
.partitionBy("brand","model","year","month")
.format("delta")
.save("path")