pyspark-Spark hbase连接器找不到数据源

pyspark-Spark hbase连接器找不到数据源,pyspark,hbase,Pyspark,Hbase,我正试图通过引用下面的链接,使用SHCAPI从Pyspark连接到hbase 示例代码: spark = SparkSession.builder.appName("Hbase Read").getOrCreate() data_source_format = 'org.apache.spark.sql.execution.datasources.hbase' catalog = ''.join("""{ "table":{"namespace":"default", "name":"

我正试图通过引用下面的链接,使用SHCAPI从Pyspark连接到hbase

示例代码:

spark = SparkSession.builder.appName("Hbase Read").getOrCreate()
data_source_format = 'org.apache.spark.sql.execution.datasources.hbase'
catalog = ''.join("""{
    "table":{"namespace":"default", "name":"table"},
    "rowkey":"key",
    "columns":{
        "firstcol":{"cf":"rowkey", "col":"key", "type":"string"},
        "secondcol":{"cf":"cf", "col":"col1", "type":"int"}
    }
}""".split())
df = spark.read \
    .options(catalog=catalog) \
    .format(data_source_format) \
    .load()
df.show()
spark-submit  --packages com.hortonworks:shc:1.1.1-2.1-s_2.11 --repositories http://repo.hortonworks.com/content/groups/public/ TestHbaseRead.py
Spark提交:

spark = SparkSession.builder.appName("Hbase Read").getOrCreate()
data_source_format = 'org.apache.spark.sql.execution.datasources.hbase'
catalog = ''.join("""{
    "table":{"namespace":"default", "name":"table"},
    "rowkey":"key",
    "columns":{
        "firstcol":{"cf":"rowkey", "col":"key", "type":"string"},
        "secondcol":{"cf":"cf", "col":"col1", "type":"int"}
    }
}""".split())
df = spark.read \
    .options(catalog=catalog) \
    .format(data_source_format) \
    .load()
df.show()
spark-submit  --packages com.hortonworks:shc:1.1.1-2.1-s_2.11 --repositories http://repo.hortonworks.com/content/groups/public/ TestHbaseRead.py
我得到了这个错误

错误日志:

: java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.sql.execution.datasources.hbase. Please find packages at http://spark.apache.org/third-party-projects.html
        at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:635)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.execution.datasources.hbase.DefaultSource
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:618)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:618)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:618)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:618)
        at scala.util.Try.orElse(Try.scala:84)
        at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:618)
        ... 13 more

嗨,Shankar,你能找到解决办法吗?我也面临着类似的问题。如果你发现了什么,请分享你的解决方案。Regards@ChauhanB:对我来说,问题是我在运行spark submit时下载了jar并添加了jar位置,我想这就是问题所在。。那是很久以前的事了…谢谢你的回复。我终于解决了这个问题,并提供了如何使用SHC的完整端到端实现,作为对我的查询的响应