Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/amazon-web-services/14.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 运行GlueContext脚本时的Py4JError_Python_Amazon Web Services_Apache Spark_Aws Glue - Fatal编程技术网

Python 运行GlueContext脚本时的Py4JError

Python 运行GlueContext脚本时的Py4JError,python,amazon-web-services,apache-spark,aws-glue,Python,Amazon Web Services,Apache Spark,Aws Glue,运行spark submit my_script.py在本地测试胶水时出现以下错误: File "C:\Spark\spark-3.1.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\context.py", line 384, in getOrCreate File "C:\Spark\spark-3.1.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\contex

运行
spark submit my_script.py
在本地测试胶水时出现以下错误:

  File "C:\Spark\spark-3.1.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\context.py", line 384, in getOrCreate
  File "C:\Spark\spark-3.1.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\context.py", line 147, in __init__
  File "C:\Spark\spark-3.1.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\context.py", line 224, in _do_init
  File "C:\Program Files\Anaconda3\lib\site-packages\py4j\java_gateway.py", line 1531, in __getattr__
    "{0}.{1} does not exist in the JVM".format(self._fqn, name))
py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.isEncryptionEnabled does not exist in the JVM
变量设置如下:

关于变量设置是否缺少任何内容?
在达到这一点之前,我做的另一个步骤是将aws glue libs\jarsv1移动到
spark-3.1.1-bin-hadoop2.7\jars
您的PythonPath(python和lib之间)中有一个不正确的正斜杠,但我不确定这是错误的根本原因。
我已经在胶水测试中成功地使用了in(windows作为本地运行PySpark的主机非常复杂)。如果您使用VSCode作为IDE,那么您可能希望沿着这条路走下去。按照上面第一个链接上的指导让您的容器运行起来,然后您可以使用下面我的建议让它设置为运行PySpark。以下是这种配置的关键部分:
DockerFile:

# See here for image contents: https://github.com/microsoft/vscode-dev-containers/tree/v0.155.1/containers/python-3/.devcontainer/base.Dockerfile

# [Choice] Python version: 3, 3.9, 3.8, 3.7, 3.6
ARG VARIANT="3.8"
FROM mcr.microsoft.com/vscode/devcontainers/python:0-${VARIANT}

# Install our Python libraries via PIP
COPY requirements.txt /tmp/pip-tmp/
RUN pip3 --disable-pip-version-check --no-cache-dir install -r /tmp/pip-tmp/requirements.txt \
  && rm -rf /tmp/pip-tmp

#override the fake-glue files (from pip) with the real ones
RUN wget https://github.com/awslabs/aws-glue-libs/archive/glue-1.0.zip -O aws-glue.zip -nv \
  && unzip -q -o aws-glue.zip \
  && rsync --remove-source-files -a aws-glu*/awsglue/* /usr/local/lib/python3.8/site-packages/awsglue \
  && rm -fr aws-glu*

#setup java -- piping to dev null as this is super verbose
RUN echo "installing Java" \
  && apt -qq -y install default-jdk > /dev/null

#setup AWS CLI
RUN echo "Downloading and installing AWS CLI" \
  && wget https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip -O aws.zip -nv \
  && unzip -q -o aws.zip \
  && rm -fr aws.zip \
  && ./aws/install \
  && mkdir blank \
  && rsync -a --delete blank/ aws/ \
  && rm -rf aws \
  && rm -rf blank

requirements.txt:

boto3==1.14.63
fake-awsglue==0.0.0.post20190320
findspark==1.4.2
numpy==1.19.5
pandas==1.2.0
py4j==0.10.9
pycodestyle==2.6.0
pylint==2.6.0
pyspark==3.0.1
python-dateutil==2.8.1
regex==2020.11.13
rope==0.18.0
devcontainer.json:

// For format details, see https://aka.ms/devcontainer.json. For config options, see the README at:
// https://github.com/microsoft/vscode-dev-containers/tree/v0.155.1/containers/python-3
{
    "name": "Python 3",
    "build": {
        "dockerfile": "Dockerfile",
        "context": "..",
        "args": { 
            // Update 'VARIANT' to pick a Python version: 3, 3.6, 3.7, 3.8, 3.9
            "VARIANT": "3.8",
            // Options
            "INSTALL_NODE": "true",
            "NODE_VERSION": "lts/*",
        },
    },

    // Set *default* container specific settings.json values on container create.
    "settings": { 
        "terminal.integrated.defaultProfile.linux": "/bin/bash",
        "python.pythonPath": "/usr/local/bin/python",
        "python.linting.enabled": true,
        "python.linting.pylintEnabled": true,
        "python.formatting.autopep8Path": "/usr/local/py-utils/bin/autopep8",
        "python.formatting.blackPath": "/usr/local/py-utils/bin/black",
        "python.formatting.yapfPath": "/usr/local/py-utils/bin/yapf",
        "python.linting.banditPath": "/usr/local/py-utils/bin/bandit",
        "python.linting.flake8Path": "/usr/local/py-utils/bin/flake8",
        "python.linting.mypyPath": "/usr/local/py-utils/bin/mypy",
        "python.linting.pycodestylePath": "/usr/local/py-utils/bin/pycodestyle",
        "python.linting.pydocstylePath": "/usr/local/py-utils/bin/pydocstyle",
        "python.linting.pylintPath": "/usr/local/py-utils/bin/pylint"
    },

    // Add the IDs of extensions you want installed when the container is created.
    "extensions": [
        "ms-python.python",
        "amazonwebservices.aws-toolkit-vscode",
        "ms-toolsai.jupyter",
        "mtxr.sqltools",
        "dbaeumer.vscode-eslint",
        "sirtori.indenticator",
    ],

    // Comment out connect as root instead. More info: https://aka.ms/vscode-remote/containers/non-root.
    "remoteUser": "vscode"
}

非常感谢!我根据非docker选项进行了设置。至于变量路径,我确实修复了它,但问题仍然存在。这是我使用过的说明: