如何使用Docker创建分布式spark群集
我正在尝试创建一个分布式spark集群,其中只有一个工人使用此docker compose如何使用Docker创建分布式spark群集,docker,apache-spark,docker-compose,Docker,Apache Spark,Docker Compose,我正在尝试创建一个分布式spark集群,其中只有一个工人使用此docker compose master: image: gettyimages/spark:2.0.0-hadoop-2.7 command: bin/spark-class org.apache.spark.deploy.master.Master -h master hostname: master container_name: spark-master environment: SPAR
master:
image: gettyimages/spark:2.0.0-hadoop-2.7
command: bin/spark-class org.apache.spark.deploy.master.Master -h master
hostname: master
container_name: spark-master
environment:
SPARK_CONF_DIR: /conf
SPARK_PUBLIC_DNS: <MASTER IP>
expose:
- 7001
- 7002
- 7003
- 7004
- 7005
- 7077
- 6066
ports:
- 4040:4040
- 6066:6066
- 7077:7077
- 8080:8080
volumes:
- ./conf/master:/conf
- ./data:/tmp/data
- ~/spark/data/:/spark/data/
worker:
image: gettyimages/spark:2.0.0-hadoop-2.7
command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://master:7077
hostname: worker
container_name: spark-worker
environment:
SPARK_CONF_DIR: /conf
SPARK_WORKER_CORES: 2
SPARK_WORKER_MEMORY: 1g
SPARK_WORKER_PORT: 8881
SPARK_WORKER_WEBUI_PORT: 8081
SPARK_PUBLIC_DNS: <WORKER IP>
links:
- master
expose:
- 7012
- 7013
- 7014
- 7015
- 8881
ports:
- 8081:8081
volumes:
- ./conf/worker:/conf
- ./data:/tmp/data
- ~/apps/sparkapp/worker/data:/spark/data/
master:
图片:GettyImage/spark:2.0.0-hadoop-2.7
命令:bin/spark class org.apache.spark.deploy.master.master-h master
主机名:master
容器名称:spark master
环境:
SPARK_CONF_DIR:/CONF
SPARK_PUBLIC_DNS:
揭露:
- 7001
- 7002
- 7003
- 7004
- 7005
- 7077
- 6066
端口:
- 4040:4040
- 6066:6066
- 7077:7077
- 8080:8080
卷数:
-/conf/master:/conf
-/数据:/tmp/数据
-~/spark/data/:/spark/data/
工人:
图片:GettyImage/spark:2.0.0-hadoop-2.7
命令:bin/spark class org.apache.spark.deploy.worker.workerspark://master:7077
主机名:worker
容器名称:spark worker
环境:
SPARK_CONF_DIR:/CONF
SPARK_WORKER_核心:2
SPARK_WORKER_内存:1g
SPARK_WORKER_端口:8881
SPARK\u WORKER\u WEBUI\u端口:8081
SPARK_PUBLIC_DNS:
链接:
-主人
揭露:
- 7012
- 7013
- 7014
- 7015
- 8881
端口:
- 8081:8081
卷数:
-./conf/worker:/conf
-/数据:/tmp/数据
-~/apps/sparkapp/worker/data:/spark/data/
但问题是docker守护进程正在同一台计算机上创建容器。这就剥夺了拥有分布式网络的全部意义。如何使用docker创建分布式spark群集?如果spark workers的相同端口中出现问题,实际上您有两个选项:
docker compose up--scale worker=2开始的每个下一个worker将使用不同的端口