Apache spark 如何从S3到Spark读取Avro中的不同分区格式?
我有一个S3存储桶,有两种分区格式: S3://bucketname/tablename/year/month/day S3://bucketname/tablename/device/year/month/day 文件格式为Avro 我试图通过val df=spark.read.formatcom.databricks.spark.avro.loads3://S3://bucketname/tablename来阅读 错误信息是Apache spark 如何从S3到Spark读取Avro中的不同分区格式?,apache-spark,amazon-s3,apache-spark-sql,avro,Apache Spark,Amazon S3,Apache Spark Sql,Avro,我有一个S3存储桶,有两种分区格式: S3://bucketname/tablename/year/month/day S3://bucketname/tablename/device/year/month/day 文件格式为Avro 我试图通过val df=spark.read.formatcom.databricks.spark.avro.loads3://S3://bucketname/tablename来阅读 错误信息是 java.lang.AssertionError: asserti
java.lang.AssertionError: assertion failed: Conflicting partition column names detected:
Partition column name list #0: xx, yy
Partition column name list #1: xx
For partitioned table directories, data files should only live in leaf directories.
And directories at the same level should have the same partition column name.
Please check the following directories for unexpected files or inconsistent partition column names:
你不能同时读这两本书。正如错误本身所提到的 同一级别的目录应具有相同的分区列 名字 使用2条s3路径分别读取这两条路径,直到叶,然后如果模式匹配,您可以合并输入DFs