在Kubernetes上运行Apache Hive（无纱线）_Hive_Kubernetes_Yarn

在Kubernetes上运行Apache Hive（无纱线）

hive kubernetes

在Kubernetes上运行Apache Hive（无纱线）,hive,kubernetes,yarn,Hive,Kubernetes,Yarn,是否可以在Kubernetes上运行Apache Hive（而不在Kubernetes上运行纱线）我在网上找不到任何合理的信息——在Kubernetes上运行蜂巢是一件不寻常的事情吗 MR3上的Hive在Kubernetes上运行，因为MR3（Hadoop和Kubernetes的新执行引擎）为Kubernetes提供了本机支持请看一看我与此主题相关的博客：假设您正在运行spark作为数据湖的批处理执行引擎，那么在spark上运行Hive Server2将很容易，即spark thri

是否可以在Kubernetes上运行Apache Hive（而不在Kubernetes上运行纱线）

我在网上找不到任何合理的信息——在Kubernetes上运行蜂巢是一件不寻常的事情吗

MR3上的Hive在Kubernetes上运行，因为MR3（Hadoop和Kubernetes的新执行引擎）为Kubernetes提供了本机支持

请看一看我与此主题相关的博客：

假设您正在运行spark作为数据湖的批处理执行引擎，那么在spark上运行Hive Server2将很容易，即spark thrift server，它与Hive Server2兼容

在kubernetes上提交spark thrift server之前，您应该在kubernetes上安装hive metastore，有一种很好的方法可以在kubernetes上安装hive metastore：

由于Spark submit将阻止在kubernetes上运行Spark thrift server，使其无法在群集模式下运行，因此您可以编写一个简单的包装器类，其中运行Spark thrift server类，如下所示：


public class SparkThriftServerRunner {

    public static void main(String[] args) {
        org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(args);
    }
}

[pcp@master-0 ~]$ kubectl get po -n spark -o wide
NAME                                          READY   STATUS    RESTARTS   AGE    IP              NODE       NOMINATED NODE   READINESS GATES
spark-thrift-server-54001673a399bdb7-exec-1   1/1     Running   0          116m   10.233.69.130   minion-2   <none>           <none>
spark-thrift-server-54001673a399bdb7-exec-2   1/1     Running   0          116m   10.233.67.207   minion-0   <none>           <none>
spark-thrift-server-54001673a399bdb7-exec-3   1/1     Running   0          116m   10.233.68.14    minion-1   <none>           <none>
spark-thrift-server-54001673a399bdb7-exec-4   1/1     Running   0          116m   10.233.69.131   minion-2   <none>           <none>
spark-thrift-server-ac08d873a397a201-driver   1/1     Running   0          118m   10.233.67.206   minion-0   <none>           <none>

，并使用maven shade插件构建spark应用程序uberjar

现在，您已经准备好将spark thrift server提交到kubernetes。为此，请运行以下命令：

spark-submit \
--master k8s://https://10.233.0.1:443 \
--deploy-mode cluster \
--name spark-thrift-server \
--class io.spongebob.hive.SparkThriftServerRunner \
--packages com.amazonaws:aws-java-sdk-s3:1.11.375,org.apache.hadoop:hadoop-aws:3.2.0 \
--conf spark.kubernetes.file.upload.path=s3a://mykidong/spark-thrift-server \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.namespace=spark \
--conf spark.kubernetes.container.image=mykidong/spark:v3.0.0 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.hadoop.hive.metastore.client.connect.retry.delay=5 \
--conf spark.hadoop.hive.metastore.client.socket.timeout=1800 \
--conf spark.hadoop.hive.metastore.uris=thrift://metastore.hive-metastore.svc.cluster.local:9083 \
--conf spark.hadoop.hive.server2.enable.doAs=false \
--conf spark.hadoop.hive.server2.thrift.http.port=10002 \
--conf spark.hadoop.hive.server2.thrift.port=10016 \
--conf spark.hadoop.hive.server2.transport.mode=binary \
--conf spark.hadoop.metastore.catalog.default=spark \
--conf spark.hadoop.hive.execution.engine=spark \
--conf spark.hadoop.hive.input.format=io.delta.hive.HiveInputFormat \
--conf spark.hadoop.hive.tez.input.format=io.delta.hive.HiveInputFormat \
--conf spark.sql.warehouse.dir=s3a://mykidong/apps/spark/warehouse \
--conf spark.hadoop.fs.defaultFS=s3a://mykidong \
--conf spark.hadoop.fs.s3a.access.key=bWluaW8= \
--conf spark.hadoop.fs.s3a.secret.key=bWluaW8xMjM= \
--conf spark.hadoop.fs.s3a.endpoint=http://10.233.25.63:9099 \
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
--conf spark.hadoop.fs.s3a.fast.upload=true \
--conf spark.driver.extraJavaOptions="-Divy.cache.dir=/tmp -Divy.home=/tmp" \
--conf spark.executor.instances=4 \
--conf spark.executor.memory=2G \
--conf spark.executor.cores=2 \
--conf spark.driver.memory=1G \
--conf spark.jars=/home/pcp/delta-lake/connectors/dist/delta-core-shaded-assembly_2.12-0.1.0.jar,/home/pcp/delta-lake/connectors/dist/hive-delta_2.12-0.1.0.jar \
file:///home/pcp/spongebob/examples/spark-thrift-server/target/spark-thrift-server-1.0.0-SNAPSHOT-spark-job.jar;

，然后Spark thrift服务器驱动程序和执行程序将以集群模式在kubernetes上运行

查看s3路径，如

s3a://mykidong/spark-thrift-server

，其中将上载spark应用程序uberjar和deps-jar文件，这些文件将从spark-thrift-server驱动程序和执行程序下载，并加载到它们的类加载器。对于上传的文件，您应该有像s3 bucket或hdfs这样的外部存储库

要作为配置单元服务器2访问spark thrift server，您可以键入以下内容：


public class SparkThriftServerRunner {

    public static void main(String[] args) {
        org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(args);
    }
}

[pcp@master-0 ~]$ kubectl get po -n spark -o wide
NAME                                          READY   STATUS    RESTARTS   AGE    IP              NODE       NOMINATED NODE   READINESS GATES
spark-thrift-server-54001673a399bdb7-exec-1   1/1     Running   0          116m   10.233.69.130   minion-2   <none>           <none>
spark-thrift-server-54001673a399bdb7-exec-2   1/1     Running   0          116m   10.233.67.207   minion-0   <none>           <none>
spark-thrift-server-54001673a399bdb7-exec-3   1/1     Running   0          116m   10.233.68.14    minion-1   <none>           <none>
spark-thrift-server-54001673a399bdb7-exec-4   1/1     Running   0          116m   10.233.69.131   minion-2   <none>           <none>
spark-thrift-server-ac08d873a397a201-driver   1/1     Running   0          118m   10.233.67.206   minion-0   <none>           <none>