Apache spark SparkHadoopWriter在UserProvider上与NPE一起失败

Apache spark SparkHadoopWriter在UserProvider上与NPE一起失败,apache-spark,hbase,Apache Spark,Hbase,我正在使用Spark将数据写入Hbase,我可以很好地读取数据,但写入失败,出现以下异常。我发现通过添加*site.xml和hbase jar解决了类似的问题。但这对我不起作用。我试图从一个表中读取数据,并将数据写入另一个表。我可以很好地读取数据,但在写入时出现异常 JavaPairRDD<ImmutableBytesWritable, Put> tablePuts = hBaseRDD.mapToPair(new PairFunction<Tuple2<Im

我正在使用Spark将数据写入Hbase,我可以很好地读取数据,但写入失败,出现以下异常。我发现通过添加*site.xml和hbase jar解决了类似的问题。但这对我不起作用。我试图从一个表中读取数据,并将数据写入另一个表。我可以很好地读取数据,但在写入时出现异常

     JavaPairRDD<ImmutableBytesWritable, Put>  tablePuts = hBaseRDD.mapToPair(new PairFunction<Tuple2<ImmutableBytesWritable, Result>, ImmutableBytesWritable, Put>() {      
            @Override
            public Tuple2<ImmutableBytesWritable, Put> call(Tuple2<ImmutableBytesWritable, Result> results) throws Exception {
                        byte[] accountId = results._2().getValue(Bytes.toBytes(COLFAMILY), Bytes.toBytes("accountId"));                       
                        String rowKey = new String(results._2().getRow();
String accountId2 = (Bytes.toString(accountId));
                        String prefix = getMd5Hash(rowKey);
                        String newrowKey = prefix + rowKey; 
                        Put put = new Put( Bytes.toBytes(newrowKey) );
                        put.addColumn(Bytes.toBytes("def"), Bytes.toBytes("accountId"), accountId);

                    }
                });
    Job newAPIJobConfiguration = Job.getInstance(conf);
        newAPIJobConfiguration.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, OUT_TABLE_NAME);
        newAPIJobConfiguration.setOutputFormatClass(org.apache.hadoop.hbase.mapreduce.TableOutputFormat.class);
        newAPIJobConfiguration.setOutputKeyClass(org.apache.hadoop.hbase.io.ImmutableBytesWritable.class);
        newAPIJobConfiguration.setOutputValueClass(org.apache.hadoop.io.Writable.class);
        tablePuts.saveAsNewAPIHadoopDataset(newAPIJobConfiguration.getConfiguration());
javapairdd tableput=hBaseRDD.mapToPair(新的PairFunction(){
@凌驾
公共Tuple2调用(Tuple2结果)引发异常{
byte[]accountId=results._2().getValue(Bytes.toBytes(COLFAMILY),Bytes.toBytes(“accountId”);
String rowKey=新字符串(results._2().getRow();
字符串accountId2=(Bytes.toString(accountId));
字符串前缀=getMd5Hash(行键);
字符串newrowKey=前缀+行键;
Put Put=新的Put(字节数.toBytes(newrowKey));
put.addColumn(Bytes.toBytes(“def”)、Bytes.toBytes(“accountId”)、accountId);
}
});
Job newAPIJobConfiguration=Job.getInstance(conf);
newAPIJobConfiguration.getConfiguration().set(TableOutputFormat.OUTPUT\u TABLE,OUT\u TABLE\u NAME);
newAPIJobConfiguration.setOutputFormatClass(org.apache.hadoop.hbase.mapreduce.TableOutputFormat.class);
newAPIJobConfiguration.setOutputKeyClass(org.apache.hadoop.hbase.io.ImmutableBytesWritable.class);
newAPIJobConfiguration.setOutputValueClass(org.apache.hadoop.io.Writable.class);
tablePuts.saveAsNewAPIHadoopDataset(newAPIJobConfiguration.getConfiguration());
线程“main”java.lang.NullPointerException中出现异常 位于org.apache.hadoop.hbase.security.UserProvider.instantiate(UserProvider.java:123) 位于org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:214) 位于org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119) 位于org.apache.hadoop.hbase.mapreduce.TableOutputFormat.checkOutputSpecs(TableOutputFormat.java:177) 位于org.apache.spark.internal.io.HadoopMapReduceWriteConfigUtil.assertConf(SparkHadoopWriter.scala:387) 在org.apache.spark.internal.io.SparkHadoopWriter$.write上(SparkHadoopWriter.scala:71) 位于org.apache.spark.rdd.pairddfunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(pairddfunctions.scala:1083) 位于org.apache.spark.rdd.pairddfunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(pairddfunctions.scala:1081) 位于org.apache.spark.rdd.pairddfunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(pairddfunctions.scala:1081) 位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) 位于org.apache.spark.rdd.rdd.withScope(rdd.scala:363) 位于org.apache.spark.rdd.pairddfunctions.saveAsNewAPIHadoopDataset(pairddfunctions.scala:1081) 位于org.apache.spark.api.java.javapairdd.saveAsNewAPIHadoopDataset(javapairdd.scala:831) 位于com.voicebase.etl.s3tohbase.HbaseScan2.main(HbaseScan2.java:148) 在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)处 位于sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)中 位于java.lang.reflect.Method.invoke(Method.java:498) 位于org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 位于org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879) 位于org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197) 位于org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227) 位于org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
位于org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

调用方法saveAsNewAPIHadoopFile时,我也遇到了同样的问题。org.apache.spark.internal.io.HadoopMapReduceWriteConfigUtil初始化表输出格式,而不设置此对象的配置。我使用saveAsHadoopFile。您也可以将spark.hadoop.validateOutSpecs设置为false,但我确信确实存在此问题是HadoopMapReduceWriteConfigUtil中的一个bug。