Python 具有pyspark的spark映射函数

Python 具有pyspark的spark映射函数,python,machine-learning,pyspark,bigdata,Python,Machine Learning,Pyspark,Bigdata,我正在尝试使用以下代码使用pyspark实现map: rdd1=sc.parallelize([1,2,3,4]) rdd2=rdd1.map(lambda x:x+5) rdd2.collect() 但当我执行此代码时,会抛出以下错误: Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkE

我正在尝试使用以下代码使用pyspark实现map:

rdd1=sc.parallelize([1,2,3,4])
rdd2=rdd1.map(lambda x:x+5)
rdd2.collect()
但当我执行此代码时,会抛出以下错误:

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 (TID 12, localhost, executor driver): java.io.IOException: Cannot run program "python": CreateProcess error=2, The system cannot find the file specified
    at java.lang.ProcessBuilder.start(Unknown Source)
    at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:133)
    at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:76)
    at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)
    at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:86)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:64)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
    at java.lang.ProcessImpl.create(Native Method)
    at java.lang.ProcessImpl.<init>(Unknown Source)
    at java.lang.ProcessImpl.start(Unknown Source)
    ... 14 more
Py4JJavaError:调用z:org.apache.spark.api.python.PythonRDD.collectAndServe时出错。
:org.apache.spark.sparkeexception:作业因阶段失败而中止:阶段3.0中的任务0失败1次,最近的失败:阶段3.0中的任务0.0丢失(TID 12,localhost,executor driver):java.io.IOException:无法运行程序“python”:CreateProcess error=2,系统找不到指定的文件
位于java.lang.ProcessBuilder.start(未知源)
位于org.apache.spark.api.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:133)
位于org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:76)
位于org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)
位于org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:86)
位于org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:64)
在org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:324)
位于org.apache.spark.rdd.rdd.iterator(rdd.scala:288)
位于org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
位于org.apache.spark.scheduler.Task.run(Task.scala:109)
位于org.apache.spark.executor.executor$TaskRunner.run(executor.scala:345)
位于java.util.concurrent.ThreadPoolExecutor.runWorker(未知源)
位于java.util.concurrent.ThreadPoolExecutor$Worker.run(未知源)
位于java.lang.Thread.run(未知源)
原因:java.io.IOException:CreateProcess error=2,系统找不到指定的文件
在java.lang.ProcessImpl.create(本机方法)
位于java.lang.ProcessImpl。(未知源)
位于java.lang.ProcessImpl.start(未知源)
... 14多
请帮帮我。
提前感谢。

您的执行者似乎找不到python可执行文件。您确定它已安装在所有数据节点上,并且是否正确设置了
PYSPARK\u DRIVER\u PYTHON
变量?是的。。我已正确设置了所有环境变量。我还设置了anaconda路径。python和pyspark都在
路径中吗?