Apache spark spark-2.1.0-bin-hadoop2.7\python:CreateProcess错误=5,访问被拒绝

Apache spark spark-2.1.0-bin-hadoop2.7\python:CreateProcess错误=5,访问被拒绝,apache-spark,pyspark,Apache Spark,Pyspark,我试图在pyspark上运行这个简单的代码,但是当我执行collect a get a error Access denied时。我不明白怎么了,我想我有所有的权利 x = sc.parallelize([("a", 1), ("b", 1), ("a", 1), ("a", 1),("b", 1), ("b", 1), ("b", 1), ("b", 1)], 3) y = x.reduceByKey(lambda accum, n: accum + n) for v in y.collect

我试图在pyspark上运行这个简单的代码,但是当我执行collect a get a error Access denied时。我不明白怎么了,我想我有所有的权利

x = sc.parallelize([("a", 1), ("b", 1), ("a", 1), ("a", 1),("b", 1), ("b", 1), ("b", 1), ("b", 1)], 3)
y = x.reduceByKey(lambda accum, n: accum + n)
for v in y.collect():
    print(v)
在本地,但我有一个错误:

CreateProcess error=5, Access is denied

    17/04/25 10:57:08 ERROR TaskSetManager: Task 2 in stage 0.0 failed 1 times; aborting job
    Traceback (most recent call last):
      File "C:/Users/rubeno/PycharmProjects/Pyspark/Twiiter_ETL.py", line 40, in <module>
        for v in y.collect():
      File "C:\Users\rubeno\Documents\spark-2.1.0-bin-hadoop2.7\python\pyspark\rdd.py", line 809, in collect
        port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
      File "C:\Users\rubeno\Documents\spark-2.1.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py", line 1133, in __call__
      File "C:\Users\rubeno\Documents\spark-2.1.0-bin-hadoop2.7\python\pyspark\sql\utils.py", line 63, in deco
        return f(*a, **kw)
      File "C:\Users\rubeno\Documents\spark-2.1.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py", line 319, in get_return_value
    py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
    : org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 1 times, most recent failure: Lost task 2.0 in stage 0.0 (TID 2, localhost, executor driver): java.io.IOException: Cannot run program "C:\Users\\rubeno\Documents\spark-2.1.0-bin-hadoop2.7\python": CreateProcess error=5, Access is denied
        at java.lang.ProcessBuilder.start(Unknown Source)
CreateProcess错误=5,访问被拒绝
17/04/25 10:57:08错误TaskSetManager:阶段0.0中的任务2失败1次;中止工作
回溯(最近一次呼叫最后一次):
文件“C:/Users/rubeno/PycharmProjects/Pyspark/twiter_ETL.py”,第40行,in
对于y中的v.collect():
文件“C:\Users\rubeno\Documents\spark-2.1.0-bin-hadoop2.7\python\pyspark\rdd.py”,第809行,在collect中
port=self.ctx.\u jvm.PythonRDD.collectAndServe(self.\u jrdd.rdd())
文件“C:\Users\rubeno\Documents\spark-2.1.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py”,第1133行,在u调用中__
文件“C:\Users\rubeno\Documents\spark-2.1.0-bin-hadoop2.7\python\pyspark\sql\utils.py”,第63行,deco格式
返回f(*a,**kw)
文件“C:\Users\rubeno\Documents\spark-2.1.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py”,第319行,在get\u return\u值中
py4j.protocol.Py4JJavaError:调用z:org.apache.spark.api.python.PythonRDD.collectAndServe时出错。
:org.apache.spark.sparkeexception:作业因阶段失败而中止:阶段0.0中的任务2失败1次,最近的失败:阶段0.0中的任务2.0丢失(TID 2,localhost,executor driver):java.io.IOException:无法运行程序“C:\Users\\rubeno\Documents\spark-2.1.0-bin-hadoop2.7\python”:CreateProcess错误=5,访问被拒绝
位于java.lang.ProcessBuilder.start(未知源)

您需要设置整个pyspark目录的权限


右键单击目录->属性->安全选项卡,为“所有人”设置“完全控制”,并启用继承。

Error=5是权限问题。我也有同样的问题(在Jupyter笔记本上),并通过启动Anaconda作为管理员解决了这个问题。