Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark spark流式hbase错误_Apache Spark_Hbase_Spark Streaming - Fatal编程技术网

Apache spark spark流式hbase错误

Apache spark spark流式hbase错误,apache-spark,hbase,spark-streaming,Apache Spark,Hbase,Spark Streaming,我想将流式数据插入hbase; 这是我的代码: val tableName = "streamingz" val conf = HBaseConfiguration.create() conf.addResource(new Path("file:///opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/etc/hbase/conf.dist/hbase-site.xml")) conf.set(TableInputFormat.INPUT_TABLE,

我想将流式数据插入
hbase
; 这是我的代码:

val tableName = "streamingz"
val conf = HBaseConfiguration.create()
conf.addResource(new Path("file:///opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/etc/hbase/conf.dist/hbase-site.xml"))
conf.set(TableInputFormat.INPUT_TABLE, tableName)

val admin = new HBaseAdmin(conf)
if (!admin.isTableAvailable(tableName)) {
    print("-----------------------------------------------------------------------------------------------------------")
    val tableDesc = new HTableDescriptor(tableName)
    tableDesc.addFamily(new HColumnDescriptor("z1".getBytes()))
    tableDesc.addFamily(new HColumnDescriptor("z2".getBytes()))
    admin.createTable(tableDesc)
} else {
    print("Table already exists!!--------------------------------------------------------------------------------------")
}
val ssc = new StreamingContext(sc, Seconds(10))
val topicSet = Set("fluxAstellia")
val kafkaParams = Map[String, String]("metadata.broker.list" - > "10.32.201.90:9092")
val stream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicSet)
val lines = stream.map(_._2).map(_.split(" ", -1)).foreachRDD(rdd => {
    if (!rdd.partitions.isEmpty) {
        val myTable = new HTable(conf, tableName)
        rdd.map(rec => {
            var put = new Put(rec._1.getBytes)
            put.add("z1".getBytes(), "name".getBytes(), Bytes.toBytes(rec._2))
            myTable.put(put)
        }).saveAsNewAPIHadoopDataset(conf)
        myTable.flushCommits()
    } else {
        println("rdd is empty")
    }

})


ssc.start()
ssc.awaitTermination()

}
}
我得到了这个错误:

:66: error: value _1 is not a member of Array[String]
       var put = new Put(rec._1.getBytes)
我是初学者,所以我怎么不能纠正这个错误,我有一个问题:

在何处创建表;流媒体流程之外还是内部


谢谢

您的错误基本上是在线的
var put=new put(rec.\u 1.getBytes)
只能在映射(键为1,值为2)或元组上调用n。

rec
是通过将流中的字符串按空格字符拆分得到的字符串数组。若您在第一个元素之后,那个么将它写为
var put=new put(rec(0).getBytes)
。同样,在下一行中,您可以将其写成
put.add(“z1.getBytes(),“name.getBytes(),Bytes.toBytes(rec(1)))

创建hbase表怎么样。我应该在哪里创建它?我得到了这个新的错误brother
error JobScheduler:error running job streaming job 149279000 ms.0 org.apache.SparkException:Task not serializable
它应该告诉您堆栈跟踪中哪个类不可序列化。map()闭包中的任何内容都应该是可序列化的。我猜HTable是不可序列化的。您可以通过使用java.io.serializable替换
val myTable=new HTable(conf,tableName)的行使其可序列化,也可以将其标记为
@transient lazy
,以便每个执行器创建自己的实例(如果您想这样做的话)。谢谢您的回答,兄弟,但我仍然有同样的问题