Scala 在spark 2.4中从spark shell写入AVRO
Java 1.8.0_161(Scala 2.11.12)上的Spark 2.4.0 运行命令:Scala 在spark 2.4中从spark shell写入AVRO,scala,apache-spark,apache-spark-sql,avro,Scala,Apache Spark,Apache Spark Sql,Avro,Java 1.8.0_161(Scala 2.11.12)上的Spark 2.4.0 运行命令:sparkshell--jars=spark-avro_2.11-2.4.0.jar 目前正在使用小型avro文件处理一些POC,我希望能够读入(单个)avro文件,进行更改,然后将其写回 阅读是好的: val myAv=spark.read.format(“avro”).load(“myAvFile.avro”) 但是,我在尝试写回时(甚至在进行任何更改之前)遇到此错误: 我已尝试手动指定数据帧的架
sparkshell--jars=spark-avro_2.11-2.4.0.jar
目前正在使用小型avro文件处理一些POC,我希望能够读入(单个)avro文件,进行更改,然后将其写回
阅读是好的:
val myAv=spark.read.format(“avro”).load(“myAvFile.avro”)
但是,我在尝试写回时(甚至在进行任何更改之前)遇到此错误:
我已尝试手动指定数据帧的架构,但无效:
.write.option(“avroSchema”,c_schema.toString).format(“avro”)…
原因很明显,模式是空的
scala> myAv.write.format("avro").save("./output-av-file.avro")
org.apache.spark.sql.AnalysisException:
Datasource does not support writing empty or nested empty schemas.
Please make sure the data schema has at least one or more column(s).
;
at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$validateSchema(DataSource.scala:733)
at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:523)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:281)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
... 49 elided
if (hasEmptySchema(schema)) {
throw new AnalysisException(
s"""
|Datasource does not support writing empty or nested empty schemas.
|Please make sure the data schema has at least one or more column(s).
""".stripMargin)
}