Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 使用spark写入强制模式_Apache Spark_Apache Spark Sql_Apache Spark Dataset_Spark Avro - Fatal编程技术网

Apache spark 使用spark写入强制模式

Apache spark 使用spark写入强制模式,apache-spark,apache-spark-sql,apache-spark-dataset,spark-avro,Apache Spark,Apache Spark Sql,Apache Spark Dataset,Spark Avro,我有一个avro格式的加密数据,它有以下模式 {"type":"record","name":"ProtectionWrapper","namespace":"com.security","fields": [{"name":"protectionInfo","type":["null",{"type":"record","name":"ProtectionInfo","fields": [{"name":"unprotected","type":"boolean"}]}]}], "writ

我有一个avro格式的加密数据,它有以下模式

{"type":"record","name":"ProtectionWrapper","namespace":"com.security","fields": 
[{"name":"protectionInfo","type":["null",{"type":"record","name":"ProtectionInfo","fields": 
[{"name":"unprotected","type":"boolean"}]}]}],
"writerSchema":"{"type":"record","name":"Demo","namespace":"com.demo","fields": 
[{"name":"id","type":"string"}]}"}
这里的“writerSchema”是加密前的数据模式。必须使用writer模式写入数据,以便decrypt函数在解密时使用它。当我使用下面的代码时,编写器模式与数据一起编写

Job mrJob = org.apache.hadoop.mapreduce.Job.getInstance(JavaSparkContext.hadoopConfiguration());
AvroJob.setDataModelClass(mrJob, SpecificData.class);
AvroJob.setOutputKeySchema(mrJob, protectionSchema) // schema shown above
JavaPairRDD<AvroKey<GenericRecord>, NullWritable> encryptedData = encryptionMethod();
encryptedData.saveAsNewAPIHadoopFile("c:\\test", AvroKey.class, NullWritable.class, 
AvroKeyOutputFormat.class, mrJob.getConfiguration());
Job mrJob=org.apache.hadoop.mapreduce.Job.getInstance(JavaSparkContext.hadoopConfiguration());
setDataModelClass(mrJob,SpecificData.class);
setOutputKeySchema(mrJob,protectionSchema)//上面显示的模式
JavaPairdd encryptedData=encryptionMethod();
encryptedData.saveAsNewAPIHadoopFile(“c:\\test”、AvroKey.class、NullWritable.class、,
AvroKeyOutputFormat.class,mrJob.getConfiguration());
但是如果我尝试将模式转换为struct类型并使用spark进行写入,那么writer模式与数据不匹配

StructType type = (StructType)SchemaConverters.toSqlType(protectionSchema).dataType();
Dataset<Row> ds = SparkSession.createDataFrame(rdd, type);
ds.write();
StructType=(StructType)SchemaConverters.toSqlType(protectionSchema.dataType();
Dataset ds=SparkSession.createDataFrame(rdd,类型);
ds.write();
使用spark write而不必使用saveAsNewAPIHadoopFile()方法是否可以实现同样的效果