Apache spark 如何查询Delta Lake元数据架构更改
我正在阅读json增量日志并尝试理解模式更改。下面是增量日志中的printSchemaApache spark 如何查询Delta Lake元数据架构更改,apache-spark,metadata,databricks,delta-lake,Apache Spark,Metadata,Databricks,Delta Lake,我正在阅读json增量日志并尝试理解模式更改。下面是增量日志中的printSchema root |-- add: struct (nullable = true) | |-- dataChange: boolean (nullable = true) | |-- modificationTime: long (nullable = true) | |-- path: string (nullable = true) | |-- size: long (null
root
|-- add: struct (nullable = true)
| |-- dataChange: boolean (nullable = true)
| |-- modificationTime: long (nullable = true)
| |-- path: string (nullable = true)
| |-- size: long (nullable = true)
|-- commitInfo: struct (nullable = true)
| |-- isBlindAppend: boolean (nullable = true)
| |-- operation: string (nullable = true)
| |-- operationMetrics: struct (nullable = true)
| | |-- executionTimeMs: string (nullable = true)
| | |-- numFiles: string (nullable = true)
| | |-- numOutputBytes: string (nullable = true)
| | |-- numOutputRows: string (nullable = true)
| | |-- numSourceRows: string (nullable = true)
| | |-- numTargetFilesAdded: string (nullable = true)
| | |-- numTargetFilesRemoved: string (nullable = true)
| | |-- numTargetRowsCopied: string (nullable = true)
| | |-- numTargetRowsDeleted: string (nullable = true)
| | |-- numTargetRowsInserted: string (nullable = true)
| | |-- numTargetRowsUpdated: string (nullable = true)
| | |-- rewriteTimeMs: string (nullable = true)
| | |-- scanTimeMs: string (nullable = true)
| |-- operationParameters: struct (nullable = true)
| | |-- matchedPredicates: string (nullable = true)
| | |-- mode: string (nullable = true)
| | |-- notMatchedPredicates: string (nullable = true)
| | |-- partitionBy: string (nullable = true)
| | |-- predicate: string (nullable = true)
| |-- readVersion: long (nullable = true)
| |-- timestamp: long (nullable = true)
|-- metaData: struct (nullable = true)
| |-- createdTime: long (nullable = true)
| |-- format: struct (nullable = true)
| | |-- provider: string (nullable = true)
| |-- id: string (nullable = true)
| |-- partitionColumns: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- schemaString: string (nullable = true)
|-- protocol: struct (nullable = true)
| |-- minReaderVersion: long (nullable = true)
| |-- minWriterVersion: long (nullable = true)
|-- remove: struct (nullable = true)
| |-- dataChange: boolean (nullable = true)
| |-- deletionTimestamp: long (nullable = true)
| |-- extendedFileMetadata: boolean (nullable = true)
| |-- path: string (nullable = true)
| |-- size: long (nullable = true)
如果我查看数据,尤其是元数据模式,它没有任何时间戳来告诉您更改发生的时间。
感谢您对我如何获得此类信息的任何帮助,如模式更改发生时。我应该以不同的方式读取这些数据,还是有其他地方可以提供发生的模式更改的历史记录。
谢谢您的意思是更改表的架构?例如添加新列等?是的,我打算启用automerge以使用Spark updateAll()和insertAll()添加其他列。所以我想创建一个视图,说明模式何时发生了变化,以及发生了什么变化