Python 烧瓶+;apachespark部署在Kubernetes上

Python 烧瓶+;apachespark部署在Kubernetes上,python,docker,apache-spark,flask,kubernetes,Python,Docker,Apache Spark,Flask,Kubernetes,我正在尝试在Kubernetes上使用ApacheSpark3.1.1部署一个Flask应用程序 app.py from flask import Flask from pyspark.sql import SparkSession app = Flask(__name__) app.debug = True @app.route('/') def main(): print("Start of Code") spark = SparkSession.bui

我正在尝试在Kubernetes上使用ApacheSpark3.1.1部署一个Flask应用程序

app.py

from flask import Flask
from pyspark.sql import SparkSession
app = Flask(__name__)
app.debug = True

@app.route('/')
def main():
    print("Start of Code")
    spark = SparkSession.builder.appName("Test").getOrCreate()
    sc=spark.sparkContext
    spark.stop()
    print("End of Code")
    return 'hi'

if __name__ == '__main__':
    app.run()
kubectl apply -f ./hello-flask.yaml

PROBLEM: using the dashboard I can see executor pods being created while booting
(the idea is to keep spark-driver always active and trigger spark-executors via API call)

kubectl get pods
    NAME                                          READY   STATUS    RESTARTS   AGE
    hello-flask-86689bdf84-ckkj4                  1/1     Running   0          5m33s
    spark-on-kubernetes-811fd878ef3d3c16-driver   1/1     Running   0          5m31s

kubectl get svc
    NAME                                              TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
    hello-flask                                       LoadBalancer   10.103.254.34   <pending>     5000:32124/TCP               6m1s
    kubernetes                                        ClusterIP      10.96.0.1       <none>        443/TCP                      6m13s
    spark-on-kubernetes-811fd878ef3d3c16-driver-svc   ClusterIP      None            <none>        7078/TCP,7079/TCP,4040/TCP   5m59s
requirements.txt

flask
pyspark
Dockerfile

  • 注意:“spark py”是普通的spark映像,可通过在“$spark_HOME”目录中运行“./bin/docker-image-tool.sh-p./kubernetes/dockerfiles/spark/bindings/python/Dockerfile build”获得

  • 注意:我在本地注册表中将此Dockerfile的结果保存为“localhost:5000/k8tsspark”

你好。yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: hello-flask
  name: hello-flask
spec:
  selector:
    matchLabels:
      app: hello-flask
  replicas: 1
  template:
    metadata:
      labels:
        app: hello-flask
    spec:
      containers:
      - name: hello-flask
        image: localhost:5000/k8tsspark:latest
        command: [
          "/bin/sh",
          "-c",
          "/opt/spark/bin/spark-submit \
          --master k8s://https://192.168.49.2:8443 \
          --deploy-mode cluster \
          --name spark-on-kubernetes \
          --conf spark.executor.instances=2 \
          --conf spark.executor.memory=1G \
          --conf spark.executor.cores=1 \
          --conf spark.kubernetes.container.image=localhost:5000/k8tsspark:latest \
          --conf spark.kubernetes.container.image.pullPolicy=Never \
          --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
          --conf spark.kubernetes.pyspark.pythonVersion=3 \
          --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
          --conf spark.dynamicAllocation.enabled=false \
          local:///app/app.py"
        ]
        imagePullPolicy: Never
        ports:
        - containerPort: 5000
---
apiVersion: v1
kind: Service
metadata:
  name: hello-flask
  labels:
    app: hello-flask
spec:
  type: LoadBalancer
  ports:
  - name: http
    port: 5000
    protocol: TCP
    targetPort: 5000
  selector:
    app: hello-flask
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spark
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: spark-role
subjects:
  - kind: ServiceAccount
    name: spark
    namespace: default
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: edit
终端-kubectl应用

from flask import Flask
from pyspark.sql import SparkSession
app = Flask(__name__)
app.debug = True

@app.route('/')
def main():
    print("Start of Code")
    spark = SparkSession.builder.appName("Test").getOrCreate()
    sc=spark.sparkContext
    spark.stop()
    print("End of Code")
    return 'hi'

if __name__ == '__main__':
    app.run()
kubectl apply -f ./hello-flask.yaml

PROBLEM: using the dashboard I can see executor pods being created while booting
(the idea is to keep spark-driver always active and trigger spark-executors via API call)

kubectl get pods
    NAME                                          READY   STATUS    RESTARTS   AGE
    hello-flask-86689bdf84-ckkj4                  1/1     Running   0          5m33s
    spark-on-kubernetes-811fd878ef3d3c16-driver   1/1     Running   0          5m31s

kubectl get svc
    NAME                                              TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
    hello-flask                                       LoadBalancer   10.103.254.34   <pending>     5000:32124/TCP               6m1s
    kubernetes                                        ClusterIP      10.96.0.1       <none>        443/TCP                      6m13s
    spark-on-kubernetes-811fd878ef3d3c16-driver-svc   ClusterIP      None            <none>        7078/TCP,7079/TCP,4040/TCP   5m59s
kubectl apply-f./hello-flask.yaml
问题:使用仪表板,我可以看到启动时正在创建executor Pod
(想法是保持spark驱动程序始终处于活动状态,并通过API调用触发spark执行器)
kubectl得到豆荚
名称就绪状态重新启动
hello-flask-86689bdf84-ckkj4 1/1运行0 5m33s
spark-on-kubernetes-811fd878ef3d3c16-驱动器1/1运行0 5m31s
kubectl获得svc
名称类型CLUSTER-IP外部IP端口年龄
hello flask LoadBalancer 10.103.254.34 5000:32124/TCP 6m1s
kubernetes ClusterIP 10.96.0.1 443/TCP 6m13s
spark-on-kubernetes-811fd878ef3d3c16-driver-svc ClusterIP None 7078/TCP、7079/TCP、4040/TCP 5m59s
终端-kubectl服务

minikube服务
|-----------|-------------|-------------|---------------------------|
|名称空间|名称|目标端口| URL|
|-----------|-------------|-------------|---------------------------|
|默认值| hello flask | http/5000 |http://192.168.49.2:32124 |
|-----------|-------------|-------------|---------------------------|

您是否可以检查日志?警告:发生了非法反射访问操作警告:org.apache.spark.unsafe.Platform(文件:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar)对构造函数java.nio.DirectByteBuffer(long,int)的非法反射访问警告:请考虑将此报告给Or.ApACH.SPARK.unFaul.Stand警告:使用-非法访问=警告以启用进一步非法反射访问操作的警告:所有非法访问操作将在以后的发布中被拒绝,然后在没有错误的情况下运行:spark-b1a81072f30844d1bd468c37be6bc31d(阶段:运行)的状态检查系统上是否打开了5000端口。在您的系统上运行示例烧瓶应用程序,并尝试从brwoserI访问它,该程序从Kubernetes上的工作烧瓶应用程序启动(相同的设置,相同的端口,没有Spark)。因此,我猜问题与Dockerfile(最初直接运行应用程序)有关,我发送了一个flask应用程序以spark submit或其他一些设置