Apache spark 使用spark写入强制模式_Apache Spark_Apache Spark Sql_Apache Spark Dataset_Spark Avro

Apache spark 使用spark写入强制模式

apache-spark

Apache spark 使用spark写入强制模式,apache-spark,apache-spark-sql,apache-spark-dataset,spark-avro,Apache Spark,Apache Spark Sql,Apache Spark Dataset,Spark Avro,我有一个avro格式的加密数据，它有以下模式 {"type":"record","name":"ProtectionWrapper","namespace":"com.security","fields": [{"name":"protectionInfo","type":["null",{"type":"record","name":"ProtectionInfo","fields": [{"name":"unprotected","type":"boolean"}]}]}], "writ

我有一个avro格式的加密数据，它有以下模式

{"type":"record","name":"ProtectionWrapper","namespace":"com.security","fields": 
[{"name":"protectionInfo","type":["null",{"type":"record","name":"ProtectionInfo","fields": 
[{"name":"unprotected","type":"boolean"}]}]}],
"writerSchema":"{"type":"record","name":"Demo","namespace":"com.demo","fields": 
[{"name":"id","type":"string"}]}"}

这里的“writerSchema”是加密前的数据模式。必须使用writer模式写入数据，以便decrypt函数在解密时使用它。当我使用下面的代码时，编写器模式与数据一起编写

Job mrJob = org.apache.hadoop.mapreduce.Job.getInstance(JavaSparkContext.hadoopConfiguration());
AvroJob.setDataModelClass(mrJob, SpecificData.class);
AvroJob.setOutputKeySchema(mrJob, protectionSchema) // schema shown above
JavaPairRDD<AvroKey<GenericRecord>, NullWritable> encryptedData = encryptionMethod();
encryptedData.saveAsNewAPIHadoopFile("c:\\test", AvroKey.class, NullWritable.class, 
AvroKeyOutputFormat.class, mrJob.getConfiguration());

Job mrJob=org.apache.hadoop.mapreduce.Job.getInstance（JavaSparkContext.hadoopConfiguration（））；
setDataModelClass（mrJob，SpecificData.class）；
setOutputKeySchema（mrJob，protectionSchema）//上面显示的模式
JavaPairdd encryptedData=encryptionMethod（）；
encryptedData.saveAsNewAPIHadoopFile（“c:\\test”、AvroKey.class、NullWritable.class、，
AvroKeyOutputFormat.class，mrJob.getConfiguration（））；

但是如果我尝试将模式转换为struct类型并使用spark进行写入，那么writer模式与数据不匹配

StructType type = (StructType)SchemaConverters.toSqlType(protectionSchema).dataType();
Dataset<Row> ds = SparkSession.createDataFrame(rdd, type);
ds.write();

StructType=（StructType）SchemaConverters.toSqlType（protectionSchema.dataType（）；
Dataset ds=SparkSession.createDataFrame（rdd，类型）；
ds.write（）；

使用spark write而不必使用saveAsNewAPIHadoopFile（）方法是否可以实现同样的效果