Apache spark Spark配置单元报告类NotFoundException:com.ibm.biginsights.bigsql.sync.BIEventListener_Apache Spark_Hive_Ibm Cloud_Yarn_Biginsights

Apache spark Spark配置单元报告类NotFoundException:com.ibm.biginsights.bigsql.sync.BIEventListener

apache-spark hive ibm-cloud

Apache spark Spark配置单元报告类NotFoundException:com.ibm.biginsights.bigsql.sync.BIEventListener,apache-spark,hive,ibm-cloud,yarn,biginsights,Apache Spark,Hive,Ibm Cloud,Yarn,Biginsights,我正在尝试在云4.2企业版的BigInsights上运行pyspark脚本，该脚本访问配置单元表首先，我创建配置单元表： [biadmin@bi4c-xxxxx-mastermanager ~]$ hive hive> CREATE TABLE pokes (foo INT, bar STRING); OK Time taken: 2.147 seconds hive> LOAD DATA LOCAL INPATH '/usr/iop/4.2.0.0/hive/doc/exampl

我正在尝试在云4.2企业版的BigInsights上运行pyspark脚本，该脚本访问配置单元表

首先，我创建配置单元表：

[biadmin@bi4c-xxxxx-mastermanager ~]$ hive
hive> CREATE TABLE pokes (foo INT, bar STRING);
OK
Time taken: 2.147 seconds
hive> LOAD DATA LOCAL INPATH '/usr/iop/4.2.0.0/hive/doc/examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
Loading data to table default.pokes
Table default.pokes stats: [numFiles=1, numRows=0, totalSize=5812, rawDataSize=0]
OK
Time taken: 0.49 seconds
hive>

然后我创建了一个简单的pyspark脚本：

[biadmin@bi4c-xxxxxx-mastermanager ~]$ cat test_pokes.py
from pyspark import SparkContext

sc = SparkContext()

from pyspark.sql import HiveContext
hc = HiveContext(sc)

pokesRdd = hc.sql('select * from pokes')
print( pokesRdd.collect() )

我尝试执行以下操作：

[biadmin@bi4c-xxxxxx-mastermanager ~]$ spark-submit \
    --master yarn-cluster \
    --deploy-mode cluster \
    --jars /usr/iop/4.2.0.0/hive/lib/datanucleus-api-jdo-3.2.6.jar, \
           /usr/iop/4.2.0.0/hive/lib/datanucleus-core-3.2.10.jar, \
           /usr/iop/4.2.0.0/hive/lib/datanucleus-rdbms-3.2.9.jar \
    --files /usr/iop/4.2.0.0/hive/conf/hive-site.xml \
    test_pokes.py

但是，我遇到了错误：

You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly
Traceback (most recent call last):
  File "test_pokes.py", line 8, in <module>
    pokesRdd = hc.sql('select * from pokes')
  File "/disk2/local/usercache/biadmin/appcache/application_1477084339086_0485/container_e09_1477084339086_0485_02_000001/pyspark.zip/pyspark/sql/context.py", line 580, in sql
  ...
  File /container_e09_1477084339086_0485_02_000001/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.
: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at 
    ...
    ...
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at 
    ...
    ... 27 more
Caused by: MetaException(message:Failed to instantiate listener named: com.ibm.biginsights.bigsql.sync.BIEventListener, reason: java.lang.ClassNotFoundException: com.ibm.biginsights.bigsql.sync.BIEventListener)
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.getMetaStoreListeners(MetaStoreUtils.java:1478)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:481)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
    at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199)
    at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
    ... 32 more

您必须使用蜂巢构建Spark。导出“SPARK\u HIVE=true”并运行构建/sbt程序集
回溯（最近一次呼叫最后一次）：
文件“test_pokes.py”，第8行，在
pokesdd=hc.sql（'select*from pokes'）
文件“/disk2/local/usercache/biadmin/appcache/application_1477084339086_0485/container_e09_1477084339086_0485_0485_02_000001/pyspark.zip/pyspark/sql/context.py”，sql中的第580行
...
文件/container_e09_1477084339086_0485_02_000001/py4j-0.9-src.zip/py4j/protocol.py”，第308行，在get_返回值中
py4j.protocol.Py4JJavaError:调用None.org.apache.spark.sql.hive.HiveContext时出错。
：java.lang.RuntimeException:java.lang.RuntimeException:无法实例化org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
在
...
...
原因：java.lang.RuntimeException:无法实例化org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
在
...
…还有27个
原因：MetaException（消息：未能实例化名为com.ibm.biginsights.bigsql.sync.BIEventListener的侦听器，原因：java.lang.ClassNotFoundException:com.ibm.biginsights.bigsql.sync.BIEventListener）
位于org.apache.hadoop.hive.metastore.MetaStoreUtils.getMetaStoreListeners（MetaStoreUtils.java:1478）
位于org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init（HiveMetaStore.java:481）
位于org.apache.hadoop.hive.metastore.RetryingHMSHandler（RetryingHMSHandler.java:66）
位于org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy（RetryingHMSHandler.java:72）
位于org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler（HiveMetaStore.java:5762）
位于org.apache.hadoop.hive.metastore.HiveMetaStoreClient。（HiveMetaStoreClient.java:199）
位于org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient。（SessionHiveMetaStoreClient.java:74）
…还有32个

另请参见以前与此问题相关的错误：

解决方案是使用spark client文件夹中的hive-site.xml：

[biadmin@bi4c-xxxxxx-mastermanager ~]$ spark-submit \
    --master yarn-cluster \
    --deploy-mode cluster \
    --jars /usr/iop/4.2.0.0/hive/lib/datanucleus-api-jdo-3.2.6.jar, \
           /usr/iop/4.2.0.0/hive/lib/datanucleus-core-3.2.10.jar, \
           /usr/iop/4.2.0.0/hive/lib/datanucleus-rdbms-3.2.9.jar \
    --files /usr/iop/current/spark-client/conf/hive-site.xml \
test_pokes.py

这在文档中有记录：