Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/maven/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 使用Spark连接到Hbase时出现异常_Scala_Maven_Apache Spark_Apache Spark Sql_Hbase - Fatal编程技术网

Scala 使用Spark连接到Hbase时出现异常

Scala 使用Spark连接到Hbase时出现异常,scala,maven,apache-spark,apache-spark-sql,hbase,Scala,Maven,Apache Spark,Apache Spark Sql,Hbase,我正在使用Spark连接到Hbase。我已经添加了所有的依赖项,但仍然得到了这个例外。请帮助我想我需要添加哪个罐子来解决这个问题 SPARK_MAJOR_VERSION is set to 2, using Spark2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/spark2/jars/slf4j-log4j12

我正在使用Spark连接到Hbase。我已经添加了所有的依赖项,但仍然得到了这个例外。请帮助我想我需要添加哪个罐子来解决这个问题

SPARK_MAJOR_VERSION is set to 2, using Spark2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/spark2/jars/slf4j-log4j12                                                                                        -1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/spark2/jars/slf4j-log4j12                                                                                        -1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/spark2/jars/phoenix-4.7.0                                                                                        .2.6.5.0-292-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/spark2/jars/phoenix-4.7.0                                                                                        .2.6.5.0-292-thin-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLeve                                                                                        l(newLevel).
18/09/17 05:34:36 WARN Utils: Service 'SparkUI' could not bind on port 4040. Att                                                                                        empting port 4041.
Spark context Web UI available at http://sandbox-hdp.hortonworks.com:4041
Spark context available as 'sc' (master = local[*], app id = local-1537162476668).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.0.2.6.5.0-292
      /_/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_171)
Type in expressions to have them evaluated.
Type :help for more information.

scala> :paste
// Entering paste mode (ctrl-D to finish)

 import org.apache.spark.sql.{SQLContext, _}
 import org.apache.spark.sql.execution.datasources.hbase._
 import org.apache.spark.{SparkConf, SparkContext}
 import spark.sqlContext.implicits._
 import org.apache.hadoop.hbase.HBaseConfiguration
 import org.apache.hadoop.hbase.client.{ConnectionFactory,HBaseAdmin,HTable,Put,Get}
 import org.apache.hadoop.hbase.util.Bytes
 import org.apache.hadoop.hbase.mapreduce.TableInputFormat
 import org.apache.hadoop.hbase.client.HBaseAdmin
 import org.apache.hadoop.hbase.{HTableDescriptor,HColumnDescriptor}

  def catalog = s"""{
     |"table":{"namespace":"default", "name":"Contacts"},
     |"rowkey":"key",
     |"columns":{
     |"rowkey":{"cf":"rowkey", "col":"key", "type":"string"},
     |"officeAddress":{"cf":"Office", "col":"Address", "type":"string"},
     |"officePhone":{"cf":"Office", "col":"Phone", "type":"string"},
     |"personalName":{"cf":"Personal", "col":"Name", "type":"string"},
     |"personalPhone":{"cf":"Personal", "col":"Phone", "type":"string"}
     |}
 |}""".stripMargin

      def withCatalog(cat: String): DataFrame = {
         spark.sqlContext
         .read
         .options(Map(HBaseTableCatalog.tableCatalog->cat))
         .format("org.apache.spark.sql.execution.datasources.hbase")
         .load()
     }
 val df = withCatalog(catalog)
 df.registerTempTable("contacts")
 val query = spark.sqlContext.sql("select personalName, officeAddress from contacts")
 query.show()   <p>
下面是spark Jar文件夹中可用的Jar

hbase-0.94.2.jar
hbase-annotations-1.2.0.jar
hbase-client-2.1.0.jar
hbase-common-2.1.0.jar
hbase-hadoop-compat-2.1.0.jar
hbase-hadoop2-compat-2.1.0.jar
hbase-it-1.1.2.2.6.5.0-292.jar
hbase-prefix-tree-1.1.2.2.6.5.0-292.jar
hbase-procedure-1.1.2.2.6.5.0-292.jar
hbase-protocol-2.1.0.jar
hbase-server-2.1.0.jar
hbase-spark-1.2.0-cdh5.8.3.jar
hbase-spark-1.1.2.2.6.5.0-292.jar
hbase-thrift-1.1.2.2.6.5.0-292.jar
hive-hbase-handler-0.12.0-cdh5.1.3.jar
hive-hbase-handler-3.1.0.jar
protobuf-java-3.5.1.jar

请给我一些建议,比如我错过了在jars文件夹中添加哪个jar以便连接到hbase。

似乎你错过了一个shc核心jar,该jar用于向hbase写入数据帧,已由hortonworks实现

当您从hortonworks shc连接器导入包时

import org.apache.spark.sql.execution.datasources.hbase.\ucode>

您需要将jar添加到spark应用程序中

获取shc core连接器的jar的步骤:

首先获取github存储库的pull,然后使用您在环境中使用的hbase和hadoop版本签出到适当的分支,并使用

mvn clean install -DskipTests
执行上述操作后,您的
~/.m2/repository/com/hortonworks/shc/

将此罐子用于spark应用程序

您可以添加到spark jar文件夹,也可以在spark submit/spark shell中使用--jars标志传递它

然后使用try来执行您正在尝试运行的代码

我遵循了相同的步骤,能够使用HCatalog读取hbase

范例

sparkshell--jars shc-core-1.1.3-2.4-s_2.11.jar

SPARK_MAJOR_VERSION is set to 2, using Spark2
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
Spark context Web UI available at http://sandbox-hdp.hortonworks.com:4040  
Spark context available as 'sc' (master = yarn, app id = 
application_1592322799672_0007).
Spark session available as 'spark'.
Welcome to
   ____              __
  / __/__  ___ _____/ /__
 _\ \/ _ \/ _ `/ __/  '_/
/___/ .__/\_,_/_/ /_/\_\   version 2.4.0.7.0.3.0-79
   /_/

Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_232)
Type in expressions to have them evaluated.
Type :help for more information.

scala> :paste
// Entering paste mode (ctrl-D to finish)

import org.apache.spark.sql.{SQLContext, _}
 import org.apache.spark.sql.execution.datasources.hbase._
 import org.apache.spark.{SparkConf, SparkContext}
 import spark.sqlContext.implicits._
 import org.apache.hadoop.hbase.HBaseConfiguration
 import org.apache.hadoop.hbase.client.{ConnectionFactory,HBaseAdmin,HTable,Put,Get}
 import org.apache.hadoop.hbase.util.Bytes
 import org.apache.hadoop.hbase.mapreduce.TableInputFormat
 import org.apache.hadoop.hbase.client.HBaseAdmin
 import org.apache.hadoop.hbase.{HTableDescriptor,HColumnDescriptor}

  def catalog = s"""{
     |"table":{"namespace":"default", "name":"Contacts"},
     |"rowkey":"key",
     |"columns":{
     |"rowkey":{"cf":"rowkey", "col":"key", "type":"string"},
     |"officeAddress":{"cf":"Office", "col":"Address", "type":"string"},
     |"officePhone":{"cf":"Office", "col":"Phone", "type":"string"},
     |"personalName":{"cf":"Personal", "col":"Name", "type":"string"},
     |"personalPhone":{"cf":"Personal", "col":"Phone", "type":"string"}
     |}
 |}""".stripMargin

      def withCatalog(cat: String): DataFrame = {
         spark.sqlContext
         .read
         .options(Map(HBaseTableCatalog.tableCatalog->cat))
         .format("org.apache.spark.sql.execution.datasources.hbase")
         .load()
     }
 val df = withCatalog(catalog)
 df.registerTempTable("contacts")
 val query = spark.sqlContext.sql("select personalName, officeAddress from contacts")
 query.show()

// Exiting paste mode, now interpreting.

warning: there was one deprecation warning; re-run with -deprecation for details
Hive Session ID = 5cc02976-98c4-447f-9ba0-e70c4a3c4ab1
+------------+-------------+                                                    
|personalName|officeAddress|
+------------+-------------+
|John Jackson| 40 Ellis St.|
|John Jackson| 40 Ellis St.|
+------------+-------------+

import org.apache.spark.sql.{SQLContext, _}
import org.apache.spark.sql.execution.datasources.hbase._
import org.apache.spark.{SparkConf, SparkContext}
import spark.sqlContext.implicits._
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.{ConnectionFactory, HBaseAdmin, HTable, Put, Get}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.{HTableDescriptor, HColumnDescriptor}
catalog: String
withCatalog: (cat: String)org.apache.spark.sql.DataFrame
df: org.apache.spark.sql.DataFrame = [rowkey: string, officeAddress: string ... 3 more fields]
query: org.apache.spark.sql.DataFrame = [personalName: string, officeAddress: string]

scala> query.show()
+------------+-------------+
|personalName|officeAddress|
+------------+-------------+
|John Jackson| 40 Ellis St.|
|John Jackson| 40 Ellis St.|
+------------+-------------+


scala> 
堆栈版本:

HBase 2.2.0
Hadoop 3.1.1
Spark 2.4.0
Scala 2.11.12

看起来您缺少了一个shc核心jar,该jar用于将数据帧写入hortonworks实现的hbase

当您从hortonworks shc连接器导入包时

import org.apache.spark.sql.execution.datasources.hbase.\ucode>

您需要将jar添加到spark应用程序中

获取shc core连接器的jar的步骤:

首先获取github存储库的pull,然后使用您在环境中使用的hbase和hadoop版本签出到适当的分支,并使用

mvn clean install -DskipTests
执行上述操作后,您的
~/.m2/repository/com/hortonworks/shc/

将此罐子用于spark应用程序

您可以添加到spark jar文件夹,也可以在spark submit/spark shell中使用--jars标志传递它

然后使用try来执行您正在尝试运行的代码

我遵循了相同的步骤,能够使用HCatalog读取hbase

范例

sparkshell--jars shc-core-1.1.3-2.4-s_2.11.jar

SPARK_MAJOR_VERSION is set to 2, using Spark2
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
Spark context Web UI available at http://sandbox-hdp.hortonworks.com:4040  
Spark context available as 'sc' (master = yarn, app id = 
application_1592322799672_0007).
Spark session available as 'spark'.
Welcome to
   ____              __
  / __/__  ___ _____/ /__
 _\ \/ _ \/ _ `/ __/  '_/
/___/ .__/\_,_/_/ /_/\_\   version 2.4.0.7.0.3.0-79
   /_/

Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_232)
Type in expressions to have them evaluated.
Type :help for more information.

scala> :paste
// Entering paste mode (ctrl-D to finish)

import org.apache.spark.sql.{SQLContext, _}
 import org.apache.spark.sql.execution.datasources.hbase._
 import org.apache.spark.{SparkConf, SparkContext}
 import spark.sqlContext.implicits._
 import org.apache.hadoop.hbase.HBaseConfiguration
 import org.apache.hadoop.hbase.client.{ConnectionFactory,HBaseAdmin,HTable,Put,Get}
 import org.apache.hadoop.hbase.util.Bytes
 import org.apache.hadoop.hbase.mapreduce.TableInputFormat
 import org.apache.hadoop.hbase.client.HBaseAdmin
 import org.apache.hadoop.hbase.{HTableDescriptor,HColumnDescriptor}

  def catalog = s"""{
     |"table":{"namespace":"default", "name":"Contacts"},
     |"rowkey":"key",
     |"columns":{
     |"rowkey":{"cf":"rowkey", "col":"key", "type":"string"},
     |"officeAddress":{"cf":"Office", "col":"Address", "type":"string"},
     |"officePhone":{"cf":"Office", "col":"Phone", "type":"string"},
     |"personalName":{"cf":"Personal", "col":"Name", "type":"string"},
     |"personalPhone":{"cf":"Personal", "col":"Phone", "type":"string"}
     |}
 |}""".stripMargin

      def withCatalog(cat: String): DataFrame = {
         spark.sqlContext
         .read
         .options(Map(HBaseTableCatalog.tableCatalog->cat))
         .format("org.apache.spark.sql.execution.datasources.hbase")
         .load()
     }
 val df = withCatalog(catalog)
 df.registerTempTable("contacts")
 val query = spark.sqlContext.sql("select personalName, officeAddress from contacts")
 query.show()

// Exiting paste mode, now interpreting.

warning: there was one deprecation warning; re-run with -deprecation for details
Hive Session ID = 5cc02976-98c4-447f-9ba0-e70c4a3c4ab1
+------------+-------------+                                                    
|personalName|officeAddress|
+------------+-------------+
|John Jackson| 40 Ellis St.|
|John Jackson| 40 Ellis St.|
+------------+-------------+

import org.apache.spark.sql.{SQLContext, _}
import org.apache.spark.sql.execution.datasources.hbase._
import org.apache.spark.{SparkConf, SparkContext}
import spark.sqlContext.implicits._
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.{ConnectionFactory, HBaseAdmin, HTable, Put, Get}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.{HTableDescriptor, HColumnDescriptor}
catalog: String
withCatalog: (cat: String)org.apache.spark.sql.DataFrame
df: org.apache.spark.sql.DataFrame = [rowkey: string, officeAddress: string ... 3 more fields]
query: org.apache.spark.sql.DataFrame = [personalName: string, officeAddress: string]

scala> query.show()
+------------+-------------+
|personalName|officeAddress|
+------------+-------------+
|John Jackson| 40 Ellis St.|
|John Jackson| 40 Ellis St.|
+------------+-------------+


scala> 
堆栈版本:

HBase 2.2.0
Hadoop 3.1.1
Spark 2.4.0
Scala 2.11.12

尝试使用hbase-protocol-shade而不是hbase-protocol.Hi@shay_uuuu?我已经下载了hbase-protocol-shade-2.1.0.jar并删除了hbase-protocol-2.1.0.jar。现在我遇到了不同的异常------>警告:有一个弃用警告;在org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:221)上使用-deprecation重新运行java.io.IOException:java.lang.reflect.InvocationTargetException获取详细信息使用HBase版本1.2.6。尝试使用HBase protocol shaded而不是HBase-protocol.Hi@shay_uuu?我已经下载了HBase-protocol-shaded-2.1.0.jar并删除了HBase-protocol-2.1.0.jar。现在我遇到不同的异常------>警告:有一个弃用警告;使用-deprecation重新运行java.io.IOException:org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:221)上的java.lang.reflect.InvocationTargetException获取详细信息使用hbase版本1.2.6。