Python 失败期间对同一分区列的数据帧写入将在表中插入不完整的数据_Python_Dataframe_Pyspark_Partitioning_Pyspark Dataframes

Python 失败期间对同一分区列的数据帧写入将在表中插入不完整的数据

python dataframe pyspark

Python 失败期间对同一分区列的数据帧写入将在表中插入不完整的数据,python,dataframe,pyspark,partitioning,pyspark-dataframes,Python,Dataframe,Pyspark,Partitioning,Pyspark Dataframes,我必须根据month_end字段作为分区列动态地编写数据帧。示例输入df： customer_id |ph_num|moth_end | 1 |123 |2020-10-31| 2 |456 |2020-10-31| 3 |789 |2020-10-31| 1 |654 |2020-10-31| 数据帧写入： df.write.partitionBy("month_end&q

我必须根据month_end字段作为分区列动态地编写数据帧。示例输入df：

customer_id  |ph_num|moth_end  |
1            |123   |2020-10-31|
2            |456   |2020-10-31|
3            |789   |2020-10-31|
1            |654   |2020-10-31|

数据帧写入：

df.write.partitionBy("month_end").mode("append").parquet("hdfs/path/db_name/table_name/")

问题是：在分区列的写入操作期间，如果由于任何系统问题而在其间发生任何故障，并且如果我们重新启动程序在同一分区文件夹上写入，则会有旧的不完整数据和新的完整数据？？或者，基于重新运行，它将只有新的完整数据

注意：我必须每月运行一次程序，因此我必须根据我处理的月末日期创建文件夹结构。因此，使用month_end列作为分区列