Apache spark 使用spark写入强制模式
我有一个avro格式的加密数据,它有以下模式Apache spark 使用spark写入强制模式,apache-spark,apache-spark-sql,apache-spark-dataset,spark-avro,Apache Spark,Apache Spark Sql,Apache Spark Dataset,Spark Avro,我有一个avro格式的加密数据,它有以下模式 {"type":"record","name":"ProtectionWrapper","namespace":"com.security","fields": [{"name":"protectionInfo","type":["null",{"type":"record","name":"ProtectionInfo","fields": [{"name":"unprotected","type":"boolean"}]}]}], "writ
{"type":"record","name":"ProtectionWrapper","namespace":"com.security","fields":
[{"name":"protectionInfo","type":["null",{"type":"record","name":"ProtectionInfo","fields":
[{"name":"unprotected","type":"boolean"}]}]}],
"writerSchema":"{"type":"record","name":"Demo","namespace":"com.demo","fields":
[{"name":"id","type":"string"}]}"}
这里的“writerSchema”是加密前的数据模式。必须使用writer模式写入数据,以便decrypt函数在解密时使用它。当我使用下面的代码时,编写器模式与数据一起编写
Job mrJob = org.apache.hadoop.mapreduce.Job.getInstance(JavaSparkContext.hadoopConfiguration());
AvroJob.setDataModelClass(mrJob, SpecificData.class);
AvroJob.setOutputKeySchema(mrJob, protectionSchema) // schema shown above
JavaPairRDD<AvroKey<GenericRecord>, NullWritable> encryptedData = encryptionMethod();
encryptedData.saveAsNewAPIHadoopFile("c:\\test", AvroKey.class, NullWritable.class,
AvroKeyOutputFormat.class, mrJob.getConfiguration());
Job mrJob=org.apache.hadoop.mapreduce.Job.getInstance(JavaSparkContext.hadoopConfiguration());
setDataModelClass(mrJob,SpecificData.class);
setOutputKeySchema(mrJob,protectionSchema)//上面显示的模式
JavaPairdd encryptedData=encryptionMethod();
encryptedData.saveAsNewAPIHadoopFile(“c:\\test”、AvroKey.class、NullWritable.class、,
AvroKeyOutputFormat.class,mrJob.getConfiguration());
但是如果我尝试将模式转换为struct类型并使用spark进行写入,那么writer模式与数据不匹配
StructType type = (StructType)SchemaConverters.toSqlType(protectionSchema).dataType();
Dataset<Row> ds = SparkSession.createDataFrame(rdd, type);
ds.write();
StructType=(StructType)SchemaConverters.toSqlType(protectionSchema.dataType();
Dataset ds=SparkSession.createDataFrame(rdd,类型);
ds.write();
使用spark write而不必使用saveAsNewAPIHadoopFile()方法是否可以实现同样的效果