Scala 在Spark工作中给HBase写信：存在类型的难题_Scala_Hadoop_Hbase_Apache Spark_Existential Type

Scala 在Spark工作中给HBase写信：存在类型的难题

scala hadoop hbase apache-spark

Scala 在Spark工作中给HBase写信：存在类型的难题,scala,hadoop,hbase,apache-spark,existential-type,Scala,Hadoop,Hbase,Apache Spark,Existential Type,我正在尝试编写一个Spark作业，将其输出放入HBase。据我所知，正确的方法是在org.apache.spark.rdd.pairddFunctions上使用方法saveAsHadoopDataset——这要求我的rdd由对组成方法saveAsHadoopDataset需要一个JobConf，这就是我试图构造的。根据，我必须在我的JobConf上设置的一件事是输出格式（事实上，没有它是不能工作的），比如问题在于，这显然无法编译，因为TableOutputFormat是泛型的，即使它忽略了它

我正在尝试编写一个Spark作业，将其输出放入HBase。据我所知，正确的方法是在

org.apache.spark.rdd.pairddFunctions

上使用方法

saveAsHadoopDataset

——这要求我的

rdd

由对组成

方法

saveAsHadoopDataset

需要一个

JobConf

，这就是我试图构造的。根据，我必须在我的

JobConf

上设置的一件事是输出格式（事实上，没有它是不能工作的），比如

问题在于，这显然无法编译，因为

TableOutputFormat

是泛型的，即使它忽略了它的类型参数。所以我尝试了各种组合，比如

jobConfig.setOutputFormat(classOf[TableOutputFormat[Unit]])
jobConfig.setOutputFormat(classOf[TableOutputFormat[_]])

但无论如何我都会出错

required: Class[_ <: org.apache.hadoop.mapred.OutputFormat[_, _]]

我可以用

foo(classOf[TableOutputFormat[Unit]])

甚至

foo(classOf[TableOutputFormat[_]])

这很重要。但我不能打电话

jobConf.setOutputFormat(classOf[TableOutputFormat[_]])

Java中

setOutputFormat

的原始签名是

void setOutputFormat（Class这很奇怪，您是否100%确定导入正确（编辑：是的，这是问题，请参阅注释），并且您的生成文件中有正确版本的人工制品？如果我提供我工作项目中的代码片段，可能会对您有所帮助：
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.mapred.JobConf
import org.apache.hadoop.hbase.mapred.TableOutputFormat

val conf = HBaseConfiguration.create()

val jobConfig: JobConf = new JobConf(conf, this.getClass)
jobConfig.setOutputFormat(classOf[TableOutputFormat])
jobConfig.set(TableOutputFormat.OUTPUT_TABLE, outputTable)

我有一些副部长：
"org.apache.hadoop" % "hadoop-client" % "2.3.0-mr1-cdh5.0.0",
"org.apache.hbase" % "hbase-client" % "0.96.1.1-cdh5.0.0", 
"org.apache.hbase" % "hbase-common" % "0.96.1.1-cdh5.0.0", 

"org.apache.hbase" % "hbase-hadoop-compat" % "0.96.1.1-cdh5.0.0",
"org.apache.hbase" % "hbase-it" % "0.96.1.1-cdh5.0.0", /
"org.apache.hbase" % "hbase-hadoop2-compat" % "0.96.1.1-cdh5.0.0",

"org.apache.hbase" % "hbase-prefix-tree" % "0.96.1.1-cdh5.0.0", 
"org.apache.hbase" % "hbase-protocol" % "0.96.1.1-cdh5.0.0", 
"org.apache.hbase" % "hbase-server" % "0.96.1.1-cdh5.0.0",
"org.apache.hbase" % "hbase-shell" % "0.96.1.1-cdh5.0.0", 

"org.apache.hbase" % "hbase-testing-util" % "0.96.1.1-cdh5.0.0", 
"org.apache.hbase" % "hbase-thrift" % "0.96.1.1-cdh5.0.0",

由于import org.apache.hadoop.hbase.mapred.TableOutputFormat
已被弃用，您可以使用以下代码作为草稿：
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat
...
val hConf = HBaseConfiguration.create()

val job = Job.getInstance(hConf)
val jobConf = job.getConfiguration
jobConf.set(TableOutputFormat.OUTPUT_TABLE, tableName)
job.setOutputFormatClass(classOf[TableOutputFormat[ImmutableBytesWritable]])
...
rdd.saveAsNewAPIHadoopDataset(jobConf)

您使用的是什么版本的Scala？我甚至不明白为什么[TableOutputFormat]

类为您编译，因为

TableOutputFormat

是通用的：-（在任何情况下，我使用的是JVM 1.7、Scala 2.10.3和“org.apache.spark”%%“spark core”%%“0.9.1”、“org.apache.hbase”%%“hbase common”%%“0.96.1-cdh5.0”、“org.apache.hbase”%%“hbase客户端”%“0.96.1.1-cdh5.0.0”、“org.apache.hbase”%“hbase服务器”%“0.96.1.1-cdh5.0.0”原来我是在导入

import org.apache.hadoop.hbase.mapreduce.TableOutputFormat

而不是

import org.apache.hadoop.hbase.mapred.TableOutputFormat

：-/现在你可能错过了；）

"org.apache.hadoop" % "hadoop-client" % "2.3.0-mr1-cdh5.0.0",
"org.apache.hbase" % "hbase-client" % "0.96.1.1-cdh5.0.0", 
"org.apache.hbase" % "hbase-common" % "0.96.1.1-cdh5.0.0", 

"org.apache.hbase" % "hbase-hadoop-compat" % "0.96.1.1-cdh5.0.0",
"org.apache.hbase" % "hbase-it" % "0.96.1.1-cdh5.0.0", /
"org.apache.hbase" % "hbase-hadoop2-compat" % "0.96.1.1-cdh5.0.0",

"org.apache.hbase" % "hbase-prefix-tree" % "0.96.1.1-cdh5.0.0", 
"org.apache.hbase" % "hbase-protocol" % "0.96.1.1-cdh5.0.0", 
"org.apache.hbase" % "hbase-server" % "0.96.1.1-cdh5.0.0",
"org.apache.hbase" % "hbase-shell" % "0.96.1.1-cdh5.0.0", 

"org.apache.hbase" % "hbase-testing-util" % "0.96.1.1-cdh5.0.0", 
"org.apache.hbase" % "hbase-thrift" % "0.96.1.1-cdh5.0.0",

import org.apache.hadoop.hbase.mapreduce.TableOutputFormat
...
val hConf = HBaseConfiguration.create()

val job = Job.getInstance(hConf)
val jobConf = job.getConfiguration
jobConf.set(TableOutputFormat.OUTPUT_TABLE, tableName)
job.setOutputFormatClass(classOf[TableOutputFormat[ImmutableBytesWritable]])
...
rdd.saveAsNewAPIHadoopDataset(jobConf)