Python PySpark:spark提交尽管路径正确,但据说丢失了

Python PySpark:spark提交尽管路径正确,但据说丢失了,python,pyspark,environment-variables,Python,Pyspark,Environment Variables,我在PyCharm调试配置中设置了以下环境变量: SPARK_HOME = /somewhere/spark-3.0.0-bin-hadoop2.7 PYTHONPATH = /somewhere/spark-3.0.0-bin-hadoop2.7/python 我正在尝试运行以下代码: import pyspark as ps context = ps.SparkContext('myApp') 代码会按照以下执行路径立即出错: File "/somewhere/venv/l

我在PyCharm调试配置中设置了以下环境变量:

SPARK_HOME = /somewhere/spark-3.0.0-bin-hadoop2.7
PYTHONPATH = /somewhere/spark-3.0.0-bin-hadoop2.7/python
我正在尝试运行以下代码:

import pyspark as ps
context = ps.SparkContext('myApp')
代码会按照以下执行路径立即出错:

  File "/somewhere/venv/lib/python3.7/site-packages/pyspark/context.py", line 325, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)

  File "/somewhere/venv/lib/python3.7/site-packages/pyspark/java_gateway.py", line 95, in launch_gateway
proc = Popen(command, **popen_kwargs)

  File "/usr/lib/python3.7/subprocess.py", line 800, in __init__
restore_signals, start_new_session)

  File "/usr/lib/python3.7/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)

  FileNotFoundError: [Errno 2] No such file or directory: '/somewhere/spark-3.0.0-bin-hadoop2.7/./bin/spark-submit': '/somewhere/spark-3.0.0-bin-hadoop2.7/./bin/spark-submit'
使用调试器,我可以跟踪到以下行:

56    SPARK_HOME = _find_spark_home()
57    # Launch the Py4j gateway using Spark's run command so that we pick up the
58    # proper classpath and settings from spark-env.sh
59    on_windows = platform.system() == "Windows"
60    script = "./bin/spark-submit.cmd" if on_windows else "./bin/spark-submit"
61    command = [os.path.join(SPARK_HOME, script)]
在第56行中,
SPARK_HOME
与值
'/somewhere/SPARK-3.0.0-bin-hadoop2.7'
是精确的。错误由第60行引起,开头是
/
。它会导致在最终错误消息中看到无关的
/
。因此,第61行导致:

'~/Downloads/spark-3.0.0-bin-hadoop2.7/./bin/spark-submit'
在查看了
spark submit.cmd
spark submit
的内部
/somewhere/spark-3.0.0-bin-hadoop2.7/bin/
之后,我通过将第60行替换为该行来更正路径,删除了有问题的字符
/

60    script = "bin/spark-submit.cmd" if on_windows else "bin/spark-submit"
这给了我第61行中
命令的以下值:

'/somewhere/spark-3.0.0-bin-hadoop2.7/bin/spark-submit'
它仍然会出错,最后一条消息是:

FileNotFoundError: [Errno 2] No such file or directory: '/somewhere/spark-3.0.0-bin-hadoop2.7/bin/spark-submit': '/somewhere/spark-3.0.0-bin-hadoop2.7/bin/spark-submit'
这是不正确的,因为该文件位于该位置,而spark shell的位置确定为spark文件夹。我做错了什么?我对编辑第60行感觉不好。请给我一些建议。我是PySpark的新手,这太深了。多谢各位

我打开终端并执行称为无效路径的字符串:

/somewhere/spark-3.0.0-bin-hadoop2.7/./bin/spark-submit
它给了我很多东西:

20/10/15 00:57:25 WARN Utils: Your hostname, fila-vm-00 resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface enp0s3)
20/10/15 00:57:25 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Usage: spark-submit [options] <app jar | python file | R file> [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
Usage: spark-submit --status [submission ID] --master [spark://...]
Usage: spark-submit run-example [options] example-class [example args]
20/10/15 00:57:25警告Utils:您的主机名fila-vm-00解析为环回地址:127.0.1.1;改用10.0.2.15(在接口enp0s3上)
20/10/15 00:57:25警告Utils:如果需要绑定到其他地址,请设置SPARK_LOCAL_IP
用法:spark提交[选项][应用程序参数]
用法:spark submit--kill[submission ID]--master[spark://...]
用法:spark submit--状态[提交ID]--主[spark://...]
用法:spark提交运行示例[选项]示例类[示例参数]
那么为什么皮查姆会抱怨呢