Scala AWS红移拼花地板副本具有不兼容的拼花地板架构_Scala_Apache Spark_Amazon Redshift_Amazon Emr_Parquet

Scala AWS红移拼花地板副本具有不兼容的拼花地板架构

scala apache-spark amazon-redshift

Scala AWS红移拼花地板副本具有不兼容的拼花地板架构,scala,apache-spark,amazon-redshift,amazon-emr,parquet,Scala,Apache Spark,Amazon Redshift,Amazon Emr,Parquet,我正在使用临时s3桶和拼花地板作为临时格式将数据帧写入Redshift。Spark已成功将数据写入s3临时存储桶，但红移尝试将数据复制到仓库失败，错误如下： error: S3 Query Exception (Fetch) code: 15001 context: Task failed due to an internal error. File 'https://s3.amazonaws.com/...../part-00001-de882e65-a5fa-4e52-9

我正在使用临时s3桶和拼花地板作为临时格式将数据帧写入Redshift。Spark已成功将数据写入s3临时存储桶，但红移尝试将数据复制到仓库失败，错误如下：

error:  S3 Query Exception (Fetch)
 code:      15001
 context:   Task failed due to an internal error. File 'https://s3.amazonaws.com/...../part-00001-de882e65-a5fa-4e52-95fd-7340f40dea82-c000.parquet  has an incompatible Parquet schema for column 's3://bucket-dev-e
 query:     17567
 location:  dory_util.cpp:872
 process:   query0_127_17567 [pid=13630]

我做错了什么？如何修复

更新1

以下是详细错误：

S3 Query Exception (Fetch). Task failed due to an internal error. 
File 'https://....d5e6c7a/part-00000-9ca1b72b-c5f5-4d8e-93ce-436cd9c3a7f1-c000.parquet  has an incompatible Parquet schema for column 's3://.....a-45f6-bd9c-d3d70d5e6c7a/manifest.json.patient_dob'. 
Column type: TIMESTAMP, Parquet schema:\noptional byte_array patient_dob [i:26 d:1 r:0]\n (s3://.......-45f6-bd9c-d3d70d5e6c7a/

Apache Spark版本

2.3.1

还尝试设置以下属性，但没有成功：

writer
   .option("compression", "none")
   .option("spark.sql.parquet.int96TimestampConversion", "true")
   .option("spark.sql.parquet.int96AsTimestamp", "true")
   .option("spark.sql.parquet.writeLegacyFormat", "true")

问题在哪里

更新2

数据框

patient\u dob

列类型为

DateType

红移

patient_dob

字段类型为

date

S3 select在

patient\u dob

Parquet字段-“patient\u dob”上显示以下内容：“1960-05-28”

您是如何编写的？哪个库/压缩。你能在上面运行sw-select吗？我用定制的

spark-redshift

lib和拼花地板支持来编写它

spark.write.option（“compression”，“none”）

BTW，您在

sw select

下的意思是什么？很抱歉，我输入错误-s3 select。进入aws控制台中的s3，然后在其中一个文件上运行select（通过右键单击）。我猜你的spark redshift不正常-如果可以的话，试着使用pandas fastparquet-看看是否有效-至少可以证明原因？

s3 select

返回的数据很好。不幸的是，我无法使用pandas

fastparquet

，因为该应用程序是用Scala语言编写的。您可以尝试按照我在这里介绍的内容进行操作吗？问题不完全相同，但可能会有所帮助。