Python 烧瓶+;apachespark部署在Kubernetes上
我正在尝试在Kubernetes上使用ApacheSpark3.1.1部署一个Flask应用程序 app.pyPython 烧瓶+;apachespark部署在Kubernetes上,python,docker,apache-spark,flask,kubernetes,Python,Docker,Apache Spark,Flask,Kubernetes,我正在尝试在Kubernetes上使用ApacheSpark3.1.1部署一个Flask应用程序 app.py from flask import Flask from pyspark.sql import SparkSession app = Flask(__name__) app.debug = True @app.route('/') def main(): print("Start of Code") spark = SparkSession.bui
from flask import Flask
from pyspark.sql import SparkSession
app = Flask(__name__)
app.debug = True
@app.route('/')
def main():
print("Start of Code")
spark = SparkSession.builder.appName("Test").getOrCreate()
sc=spark.sparkContext
spark.stop()
print("End of Code")
return 'hi'
if __name__ == '__main__':
app.run()
kubectl apply -f ./hello-flask.yaml
PROBLEM: using the dashboard I can see executor pods being created while booting
(the idea is to keep spark-driver always active and trigger spark-executors via API call)
kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-flask-86689bdf84-ckkj4 1/1 Running 0 5m33s
spark-on-kubernetes-811fd878ef3d3c16-driver 1/1 Running 0 5m31s
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hello-flask LoadBalancer 10.103.254.34 <pending> 5000:32124/TCP 6m1s
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 6m13s
spark-on-kubernetes-811fd878ef3d3c16-driver-svc ClusterIP None <none> 7078/TCP,7079/TCP,4040/TCP 5m59s
requirements.txt
flask
pyspark
Dockerfile
- 注意:“spark py”是普通的spark映像,可通过在“$spark_HOME”目录中运行“./bin/docker-image-tool.sh-p./kubernetes/dockerfiles/spark/bindings/python/Dockerfile build”获得
- 注意:我在本地注册表中将此Dockerfile的结果保存为“localhost:5000/k8tsspark”
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: hello-flask
name: hello-flask
spec:
selector:
matchLabels:
app: hello-flask
replicas: 1
template:
metadata:
labels:
app: hello-flask
spec:
containers:
- name: hello-flask
image: localhost:5000/k8tsspark:latest
command: [
"/bin/sh",
"-c",
"/opt/spark/bin/spark-submit \
--master k8s://https://192.168.49.2:8443 \
--deploy-mode cluster \
--name spark-on-kubernetes \
--conf spark.executor.instances=2 \
--conf spark.executor.memory=1G \
--conf spark.executor.cores=1 \
--conf spark.kubernetes.container.image=localhost:5000/k8tsspark:latest \
--conf spark.kubernetes.container.image.pullPolicy=Never \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.pyspark.pythonVersion=3 \
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
--conf spark.dynamicAllocation.enabled=false \
local:///app/app.py"
]
imagePullPolicy: Never
ports:
- containerPort: 5000
---
apiVersion: v1
kind: Service
metadata:
name: hello-flask
labels:
app: hello-flask
spec:
type: LoadBalancer
ports:
- name: http
port: 5000
protocol: TCP
targetPort: 5000
selector:
app: hello-flask
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: spark
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: spark-role
subjects:
- kind: ServiceAccount
name: spark
namespace: default
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: edit
终端-kubectl应用
from flask import Flask
from pyspark.sql import SparkSession
app = Flask(__name__)
app.debug = True
@app.route('/')
def main():
print("Start of Code")
spark = SparkSession.builder.appName("Test").getOrCreate()
sc=spark.sparkContext
spark.stop()
print("End of Code")
return 'hi'
if __name__ == '__main__':
app.run()
kubectl apply -f ./hello-flask.yaml
PROBLEM: using the dashboard I can see executor pods being created while booting
(the idea is to keep spark-driver always active and trigger spark-executors via API call)
kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-flask-86689bdf84-ckkj4 1/1 Running 0 5m33s
spark-on-kubernetes-811fd878ef3d3c16-driver 1/1 Running 0 5m31s
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hello-flask LoadBalancer 10.103.254.34 <pending> 5000:32124/TCP 6m1s
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 6m13s
spark-on-kubernetes-811fd878ef3d3c16-driver-svc ClusterIP None <none> 7078/TCP,7079/TCP,4040/TCP 5m59s
kubectl apply-f./hello-flask.yaml
问题:使用仪表板,我可以看到启动时正在创建executor Pod
(想法是保持spark驱动程序始终处于活动状态,并通过API调用触发spark执行器)
kubectl得到豆荚
名称就绪状态重新启动
hello-flask-86689bdf84-ckkj4 1/1运行0 5m33s
spark-on-kubernetes-811fd878ef3d3c16-驱动器1/1运行0 5m31s
kubectl获得svc
名称类型CLUSTER-IP外部IP端口年龄
hello flask LoadBalancer 10.103.254.34 5000:32124/TCP 6m1s
kubernetes ClusterIP 10.96.0.1 443/TCP 6m13s
spark-on-kubernetes-811fd878ef3d3c16-driver-svc ClusterIP None 7078/TCP、7079/TCP、4040/TCP 5m59s
终端-kubectl服务
minikube服务
|-----------|-------------|-------------|---------------------------|
|名称空间|名称|目标端口| URL|
|-----------|-------------|-------------|---------------------------|
|默认值| hello flask | http/5000 |http://192.168.49.2:32124 |
|-----------|-------------|-------------|---------------------------|
您是否可以检查日志?警告:发生了非法反射访问操作警告:org.apache.spark.unsafe.Platform(文件:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar)对构造函数java.nio.DirectByteBuffer(long,int)的非法反射访问警告:请考虑将此报告给Or.ApACH.SPARK.unFaul.Stand警告:使用-非法访问=警告以启用进一步非法反射访问操作的警告:所有非法访问操作将在以后的发布中被拒绝,然后在没有错误的情况下运行:spark-b1a81072f30844d1bd468c37be6bc31d(阶段:运行)的状态检查系统上是否打开了5000端口。在您的系统上运行示例烧瓶应用程序,并尝试从brwoserI访问它,该程序从Kubernetes上的工作烧瓶应用程序启动(相同的设置,相同的端口,没有Spark)。因此,我猜问题与Dockerfile(最初直接运行应用程序)有关,我发送了一个flask应用程序以spark submit或其他一些设置