Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Spark Docker Java网关进程在发送其端口号之前退出_Docker_Apache Spark_Pyspark - Fatal编程技术网

Spark Docker Java网关进程在发送其端口号之前退出

Spark Docker Java网关进程在发送其端口号之前退出,docker,apache-spark,pyspark,Docker,Apache Spark,Pyspark,我是docker的新手,正在尝试使用airflow和pyspark运行docker compose文件。以下是我目前的情况: version: '3.7' services: master: image: gettyimages/spark command: bin/spark-class org.apache.spark.deploy.master.Master -h master hostname: master environment

我是docker的新手,正在尝试使用airflow和pyspark运行docker compose文件。以下是我目前的情况:

version: '3.7'
services:
    master:
      image: gettyimages/spark
      command: bin/spark-class org.apache.spark.deploy.master.Master -h master
      hostname: master
      environment:
        MASTER: spark://master:7077
        SPARK_CONF_DIR: /conf
        SPARK_PUBLIC_DNS: localhost
      expose:
        - 7001
        - 7002
        - 7003
        - 7004
        - 7005
        - 7077
        - 6066
      ports:
        - 4040:4040
        - 6066:6066
        - 7077:7077
        - 8080:8080
      volumes:
        - ./conf/master:/conf
        - ./data:/tmp/data

    worker:
      image: gettyimages/spark
      command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://master:7077
      hostname: worker
      environment:
        SPARK_CONF_DIR: /conf
        SPARK_WORKER_CORES: 2
        SPARK_WORKER_MEMORY: 1g
        SPARK_WORKER_PORT: 8881
        SPARK_WORKER_WEBUI_PORT: 8081
        SPARK_PUBLIC_DNS: localhost
      links:
        - master
      expose:
        - 7012
        - 7013
        - 7014
        - 7015
        - 8881
      ports:
        - 8081:8081
      volumes:
        - ./conf/worker:/conf
        - ./data:/tmp/data
    postgres:
        image: postgres:9.6
        environment:
            - POSTGRES_USER=airflow
            - POSTGRES_PASSWORD=airflow
            - POSTGRES_DB=airflow
        logging:
            options:
                max-size: 10m
                max-file: "3"

    webserver:
        image: puckel/docker-airflow:1.10.9
        restart: always
        depends_on:
            - postgres
        environment:
            - LOAD_EX=y
            - EXECUTOR=Local
        logging:
            options:
                max-size: 10m
                max-file: "3"
        volumes:
            - ./dags:/usr/local/airflow/dags
            # Add this to have third party packages
            - ./requirements.txt:/requirements.txt
            # - ./plugins:/usr/local/airflow/plugins
        ports:
            - "8082:8080"
        command: webserver
        healthcheck:
            test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
            interval: 30s
            timeout: 30s
            retries: 3
我正在尝试运行以下简单的DAG,以确认pyspark运行正常:

import pyspark
from airflow.models import DAG
from airflow.utils.dates import days_ago, timedelta
from airflow.operators.python_operator import PythonOperator
from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator

import random

args = {
    "owner": "ian",
    "start_date": days_ago(1)
}

dag = DAG(dag_id="pysparkTest", default_args=args, schedule_interval=None)


def run_this_func(**context):
    sc = pyspark.SparkContext()
    print(sc)

with dag:
    run_this_task = PythonOperator(
        task_id='run_this',
        python_callable=run_this_func,
        provide_context=True,
        retries=10,
        retry_delay=timedelta(seconds=1)
    )

当我这样做时,它会失败,错误是Java网关进程在发送端口号之前退出。我发现有几篇帖子说要运行命令
export-PYSPARK\u SUBMIT\u ARGS=“--master local[2]PYSPARK shell”
,我试着像这样运行命令:

version: '3.7'
services:
    master:
      image: gettyimages/spark
      command: >
        sh -c "bin/spark-class org.apache.spark.deploy.master.Master -h master
        && export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell""
      hostname: master
...

但我还是犯了同样的错误。你知道我做错了什么吗?

我认为你不需要修改
主程序的命令。就这样吧

此外,您希望在不同容器上运行的python代码如何连接主容器。我认为您应该将其添加到spark上下文中,例如:

def run_this_func(**context):
    sc = pyspark.SparkContext("spark://master:7077")
    print(sc)

哪个容器产生此错误?火花容器。使用此docker compose,我可以毫无问题地运行纯python DAGfile@DBA108642你能说得更准确些吗?您没有
spark
容器。您是否在
主控
工作者
中看到了这一点?您是否尝试运行
gettyimages
original docker compose?它在同一个问题上有效还是失败了?我还没有尝试过这么做,我只是去了
gettyimages
repo并将他们的docker compose添加到我的中,但我会给出一个答案shot@DBA108642你解决了问题吗?不幸的是,这并没有解决我的问题,我被迫采取不同的方法