Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/image-processing/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala DataRicks-无法从数据帧写入增量位置_Scala_Apache Spark_Databricks_Delta Lake - Fatal编程技术网

Scala DataRicks-无法从数据帧写入增量位置

Scala DataRicks-无法从数据帧写入增量位置,scala,apache-spark,databricks,delta-lake,Scala,Apache Spark,Databricks,Delta Lake,我想更改Databricks Delta表的列名 因此,我做了以下工作: // Read old table data val old_data_DF = spark.read.format("delta") .load("dbfs:/mnt/main/sales") // Created a new DF with a renamed column val new_data_DF = old_data_DF .withColumnRenamed("column_a", "metr

我想更改Databricks Delta表的列名

因此,我做了以下工作:

// Read old table data
val old_data_DF = spark.read.format("delta")
.load("dbfs:/mnt/main/sales")

// Created a new DF with a renamed column
val new_data_DF = old_data_DF
      .withColumnRenamed("column_a", "metric1")
      .select("*")

// Dropped and recereated the Delta files location
dbutils.fs.rm("dbfs:/mnt/main/sales", true)
dbutils.fs.mkdirs("dbfs:/mnt/main/sales")

// Trying to write the new DF to the location
new_data_DF.write
.format("delta")
.partitionBy("sale_date_partition")
.save("dbfs:/mnt/main/sales")
在这里,我在最后一步写入Delta时遇到一个错误:

java.io.FileNotFoundException: dbfs:/mnt/main/sales/sale_date_partition=2019-04-29/part-00000-769.c000.snappy.parquet
A file referenced in the transaction log cannot be found. This occurs when data has been manually deleted from the file system rather than using the table `DELETE` statement
显然,数据被删除了,我很可能遗漏了上述逻辑中的某些内容。现在,唯一包含数据的地方是
新数据\u DF
。 写入像
dbfs:/mnt/main/sales\u tmp这样的位置也会失败


如何将数据从
new\u data\u DF
写入增量位置?

通常,最好避免在增量表上使用
rm
。在大多数情况下,Delta的事务日志可以防止最终的一致性问题,但是,当您在很短的时间内删除并重新创建一个表时,不同版本的事务日志可能会忽隐忽现

相反,我建议使用Delta提供的事务原语。例如,您可以:


如果您有一个已损坏的表,您可以使用来修复它。

您可以通过以下方式进行修复

// Read old table data
val old_data_DF = spark.read.format("delta")
.load("dbfs:/mnt/main/sales")

// Created a new DF with a renamed column
val new_data_DF = old_data_DF
  .withColumnRenamed("column_a", "metric1")
  .select("*")

// Trying to write the new DF to the location
new_data_DF.write
.format("delta")
.mode("overwrite") // this would overwrite the whole data files
.option("overwriteSchema", "true")  //this is the key line.
.partitionBy("sale_date_partition")
.save("dbfs:/mnt/main/sales")

OverWriteSchema选项将使用转换期间更新的最新架构创建新的物理文件。

数据帧是引用,不包含任何数据。您应该尝试将数据写入其他位置,然后交换表。覆盖不允许重命名列,我收到以下错误:
AnalysisException:在写入增量表时检测到架构不匹配
您可以使用
overwriteSchema
标记您还想更改其架构将表作为覆盖的一部分。
// Read old table data
val old_data_DF = spark.read.format("delta")
.load("dbfs:/mnt/main/sales")

// Created a new DF with a renamed column
val new_data_DF = old_data_DF
  .withColumnRenamed("column_a", "metric1")
  .select("*")

// Trying to write the new DF to the location
new_data_DF.write
.format("delta")
.mode("overwrite") // this would overwrite the whole data files
.option("overwriteSchema", "true")  //this is the key line.
.partitionBy("sale_date_partition")
.save("dbfs:/mnt/main/sales")