在Kubernetes上运行Apache Hive(无纱线)

在Kubernetes上运行Apache Hive(无纱线),hive,kubernetes,yarn,Hive,Kubernetes,Yarn,是否可以在Kubernetes上运行Apache Hive(而不在Kubernetes上运行纱线) 我在网上找不到任何合理的信息——在Kubernetes上运行蜂巢是一件不寻常的事情吗 MR3上的Hive在Kubernetes上运行,因为MR3(Hadoop和Kubernetes的新执行引擎)为Kubernetes提供了本机支持 请看一看我与此主题相关的博客: 假设您正在运行spark作为数据湖的批处理执行引擎,那么在spark上运行Hive Server2将很容易,即spark thri

是否可以在Kubernetes上运行Apache Hive(而不在Kubernetes上运行纱线)


我在网上找不到任何合理的信息——在Kubernetes上运行蜂巢是一件不寻常的事情吗

MR3上的Hive在Kubernetes上运行,因为MR3(Hadoop和Kubernetes的新执行引擎)为Kubernetes提供了本机支持


请看一看我与此主题相关的博客:

假设您正在运行spark作为数据湖的批处理执行引擎,那么在spark上运行Hive Server2将很容易,即spark thrift server,它与Hive Server2兼容

在kubernetes上提交spark thrift server之前,您应该在kubernetes上安装hive metastore,有一种很好的方法可以在kubernetes上安装hive metastore:

由于Spark submit将阻止在kubernetes上运行Spark thrift server,使其无法在群集模式下运行,因此您可以编写一个简单的包装器类,其中运行Spark thrift server类,如下所示:


public class SparkThriftServerRunner {

    public static void main(String[] args) {
        org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(args);
    }
}
[pcp@master-0 ~]$ kubectl get po -n spark -o wide
NAME                                          READY   STATUS    RESTARTS   AGE    IP              NODE       NOMINATED NODE   READINESS GATES
spark-thrift-server-54001673a399bdb7-exec-1   1/1     Running   0          116m   10.233.69.130   minion-2   <none>           <none>
spark-thrift-server-54001673a399bdb7-exec-2   1/1     Running   0          116m   10.233.67.207   minion-0   <none>           <none>
spark-thrift-server-54001673a399bdb7-exec-3   1/1     Running   0          116m   10.233.68.14    minion-1   <none>           <none>
spark-thrift-server-54001673a399bdb7-exec-4   1/1     Running   0          116m   10.233.69.131   minion-2   <none>           <none>
spark-thrift-server-ac08d873a397a201-driver   1/1     Running   0          118m   10.233.67.206   minion-0   <none>           <none>
,并使用maven shade插件构建spark应用程序uberjar

现在,您已经准备好将spark thrift server提交到kubernetes。 为此,请运行以下命令:

spark-submit \
--master k8s://https://10.233.0.1:443 \
--deploy-mode cluster \
--name spark-thrift-server \
--class io.spongebob.hive.SparkThriftServerRunner \
--packages com.amazonaws:aws-java-sdk-s3:1.11.375,org.apache.hadoop:hadoop-aws:3.2.0 \
--conf spark.kubernetes.file.upload.path=s3a://mykidong/spark-thrift-server \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.namespace=spark \
--conf spark.kubernetes.container.image=mykidong/spark:v3.0.0 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.hadoop.hive.metastore.client.connect.retry.delay=5 \
--conf spark.hadoop.hive.metastore.client.socket.timeout=1800 \
--conf spark.hadoop.hive.metastore.uris=thrift://metastore.hive-metastore.svc.cluster.local:9083 \
--conf spark.hadoop.hive.server2.enable.doAs=false \
--conf spark.hadoop.hive.server2.thrift.http.port=10002 \
--conf spark.hadoop.hive.server2.thrift.port=10016 \
--conf spark.hadoop.hive.server2.transport.mode=binary \
--conf spark.hadoop.metastore.catalog.default=spark \
--conf spark.hadoop.hive.execution.engine=spark \
--conf spark.hadoop.hive.input.format=io.delta.hive.HiveInputFormat \
--conf spark.hadoop.hive.tez.input.format=io.delta.hive.HiveInputFormat \
--conf spark.sql.warehouse.dir=s3a://mykidong/apps/spark/warehouse \
--conf spark.hadoop.fs.defaultFS=s3a://mykidong \
--conf spark.hadoop.fs.s3a.access.key=bWluaW8= \
--conf spark.hadoop.fs.s3a.secret.key=bWluaW8xMjM= \
--conf spark.hadoop.fs.s3a.endpoint=http://10.233.25.63:9099 \
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
--conf spark.hadoop.fs.s3a.fast.upload=true \
--conf spark.driver.extraJavaOptions="-Divy.cache.dir=/tmp -Divy.home=/tmp" \
--conf spark.executor.instances=4 \
--conf spark.executor.memory=2G \
--conf spark.executor.cores=2 \
--conf spark.driver.memory=1G \
--conf spark.jars=/home/pcp/delta-lake/connectors/dist/delta-core-shaded-assembly_2.12-0.1.0.jar,/home/pcp/delta-lake/connectors/dist/hive-delta_2.12-0.1.0.jar \
file:///home/pcp/spongebob/examples/spark-thrift-server/target/spark-thrift-server-1.0.0-SNAPSHOT-spark-job.jar;
,然后Spark thrift服务器驱动程序和执行程序将以集群模式在kubernetes上运行

查看s3路径,如
s3a://mykidong/spark-thrift-server
,其中将上载spark应用程序uberjar和deps-jar文件,这些文件将从spark-thrift-server驱动程序和执行程序下载,并加载到它们的类加载器。对于上传的文件,您应该有像s3 bucket或hdfs这样的外部存储库

要作为配置单元服务器2访问spark thrift server,您可以键入以下内容:


public class SparkThriftServerRunner {

    public static void main(String[] args) {
        org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(args);
    }
}
[pcp@master-0 ~]$ kubectl get po -n spark -o wide
NAME                                          READY   STATUS    RESTARTS   AGE    IP              NODE       NOMINATED NODE   READINESS GATES
spark-thrift-server-54001673a399bdb7-exec-1   1/1     Running   0          116m   10.233.69.130   minion-2   <none>           <none>
spark-thrift-server-54001673a399bdb7-exec-2   1/1     Running   0          116m   10.233.67.207   minion-0   <none>           <none>
spark-thrift-server-54001673a399bdb7-exec-3   1/1     Running   0          116m   10.233.68.14    minion-1   <none>           <none>
spark-thrift-server-54001673a399bdb7-exec-4   1/1     Running   0          116m   10.233.69.131   minion-2   <none>           <none>
spark-thrift-server-ac08d873a397a201-driver   1/1     Running   0          118m   10.233.67.206   minion-0   <none>           <none>