Python 运行GlueContext脚本时的Py4JError
运行Python 运行GlueContext脚本时的Py4JError,python,amazon-web-services,apache-spark,aws-glue,Python,Amazon Web Services,Apache Spark,Aws Glue,运行spark submit my_script.py在本地测试胶水时出现以下错误: File "C:\Spark\spark-3.1.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\context.py", line 384, in getOrCreate File "C:\Spark\spark-3.1.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\contex
spark submit my_script.py
在本地测试胶水时出现以下错误:
File "C:\Spark\spark-3.1.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\context.py", line 384, in getOrCreate
File "C:\Spark\spark-3.1.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\context.py", line 147, in __init__
File "C:\Spark\spark-3.1.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\context.py", line 224, in _do_init
File "C:\Program Files\Anaconda3\lib\site-packages\py4j\java_gateway.py", line 1531, in __getattr__
"{0}.{1} does not exist in the JVM".format(self._fqn, name))
py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.isEncryptionEnabled does not exist in the JVM
变量设置如下:
关于变量设置是否缺少任何内容?
在达到这一点之前,我做的另一个步骤是将aws glue libs\jarsv1移动到
spark-3.1.1-bin-hadoop2.7\jars
您的PythonPath(python和lib之间)中有一个不正确的正斜杠,但我不确定这是错误的根本原因。我已经在胶水测试中成功地使用了in(windows作为本地运行PySpark的主机非常复杂)。如果您使用VSCode作为IDE,那么您可能希望沿着这条路走下去。按照上面第一个链接上的指导让您的容器运行起来,然后您可以使用下面我的建议让它设置为运行PySpark。以下是这种配置的关键部分:
DockerFile:
# See here for image contents: https://github.com/microsoft/vscode-dev-containers/tree/v0.155.1/containers/python-3/.devcontainer/base.Dockerfile
# [Choice] Python version: 3, 3.9, 3.8, 3.7, 3.6
ARG VARIANT="3.8"
FROM mcr.microsoft.com/vscode/devcontainers/python:0-${VARIANT}
# Install our Python libraries via PIP
COPY requirements.txt /tmp/pip-tmp/
RUN pip3 --disable-pip-version-check --no-cache-dir install -r /tmp/pip-tmp/requirements.txt \
&& rm -rf /tmp/pip-tmp
#override the fake-glue files (from pip) with the real ones
RUN wget https://github.com/awslabs/aws-glue-libs/archive/glue-1.0.zip -O aws-glue.zip -nv \
&& unzip -q -o aws-glue.zip \
&& rsync --remove-source-files -a aws-glu*/awsglue/* /usr/local/lib/python3.8/site-packages/awsglue \
&& rm -fr aws-glu*
#setup java -- piping to dev null as this is super verbose
RUN echo "installing Java" \
&& apt -qq -y install default-jdk > /dev/null
#setup AWS CLI
RUN echo "Downloading and installing AWS CLI" \
&& wget https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip -O aws.zip -nv \
&& unzip -q -o aws.zip \
&& rm -fr aws.zip \
&& ./aws/install \
&& mkdir blank \
&& rsync -a --delete blank/ aws/ \
&& rm -rf aws \
&& rm -rf blank
requirements.txt:
boto3==1.14.63
fake-awsglue==0.0.0.post20190320
findspark==1.4.2
numpy==1.19.5
pandas==1.2.0
py4j==0.10.9
pycodestyle==2.6.0
pylint==2.6.0
pyspark==3.0.1
python-dateutil==2.8.1
regex==2020.11.13
rope==0.18.0
devcontainer.json:
// For format details, see https://aka.ms/devcontainer.json. For config options, see the README at:
// https://github.com/microsoft/vscode-dev-containers/tree/v0.155.1/containers/python-3
{
"name": "Python 3",
"build": {
"dockerfile": "Dockerfile",
"context": "..",
"args": {
// Update 'VARIANT' to pick a Python version: 3, 3.6, 3.7, 3.8, 3.9
"VARIANT": "3.8",
// Options
"INSTALL_NODE": "true",
"NODE_VERSION": "lts/*",
},
},
// Set *default* container specific settings.json values on container create.
"settings": {
"terminal.integrated.defaultProfile.linux": "/bin/bash",
"python.pythonPath": "/usr/local/bin/python",
"python.linting.enabled": true,
"python.linting.pylintEnabled": true,
"python.formatting.autopep8Path": "/usr/local/py-utils/bin/autopep8",
"python.formatting.blackPath": "/usr/local/py-utils/bin/black",
"python.formatting.yapfPath": "/usr/local/py-utils/bin/yapf",
"python.linting.banditPath": "/usr/local/py-utils/bin/bandit",
"python.linting.flake8Path": "/usr/local/py-utils/bin/flake8",
"python.linting.mypyPath": "/usr/local/py-utils/bin/mypy",
"python.linting.pycodestylePath": "/usr/local/py-utils/bin/pycodestyle",
"python.linting.pydocstylePath": "/usr/local/py-utils/bin/pydocstyle",
"python.linting.pylintPath": "/usr/local/py-utils/bin/pylint"
},
// Add the IDs of extensions you want installed when the container is created.
"extensions": [
"ms-python.python",
"amazonwebservices.aws-toolkit-vscode",
"ms-toolsai.jupyter",
"mtxr.sqltools",
"dbaeumer.vscode-eslint",
"sirtori.indenticator",
],
// Comment out connect as root instead. More info: https://aka.ms/vscode-remote/containers/non-root.
"remoteUser": "vscode"
}
非常感谢!我根据非docker选项进行了设置。至于变量路径,我确实修复了它,但问题仍然存在。这是我使用过的说明: