Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java Spark:对从HBase数据创建的RDD进行分区_Java_Apache Spark_Hbase_Rdd - Fatal编程技术网

Java Spark:对从HBase数据创建的RDD进行分区

Java Spark:对从HBase数据创建的RDD进行分区,java,apache-spark,hbase,rdd,Java,Apache Spark,Hbase,Rdd,如果我使用 JavaPairRDD<ImmutableBytesWritable, Result> usersRDD = sc.newAPIHadoopRDD(hbaseConf, TableInputFormat.class, ImmutableBytesWritable.class, Result.class); javapairdd usersRDD=sc.newAPIHadoopRDD(hbaseConf,TableInputFormat.class,ImmutableBy

如果我使用

JavaPairRDD<ImmutableBytesWritable, Result> usersRDD = sc.newAPIHadoopRDD(hbaseConf, TableInputFormat.class, ImmutableBytesWritable.class, Result.class);
javapairdd usersRDD=sc.newAPIHadoopRDD(hbaseConf,TableInputFormat.class,ImmutableBytesWritable.class,Result.class);
生成的RDD有1个分区,正如我所看到的调用
usersRDD.partitions().size()
。像Spark抱怨的那样,使用像
usersRDD.repartition(10)
这样的方法是不可行的,因为ImmutableBytesWritable是不可序列化的


有没有办法让Spark从HBase数据创建分区RDD?

使用
org.apache.hadoop.HBase.mapreduce.TableInputFormat
时,Spark分区的数量取决于HBase表的区域数-在您的情况下,它是1(默认值)。请查看以了解更多详细信息