Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python KeyError:SparkConf初始化期间SPARK_HOME_Python_Apache Spark_Pyspark - Fatal编程技术网

Python KeyError:SparkConf初始化期间SPARK_HOME

Python KeyError:SparkConf初始化期间SPARK_HOME,python,apache-spark,pyspark,Python,Apache Spark,Pyspark,我是spark新手,我想从命令行运行Python脚本。我已经以交互方式测试了pyspark,它可以正常工作。我在尝试创建sc时遇到此错误: File "test.py", line 10, in <module> conf=(SparkConf().setMaster('local').setAppName('a').setSparkHome('/home/dirk/spark-1.4.1-bin-hadoop2.6/bin')) File "/home/dirk/spa

我是spark新手,我想从命令行运行Python脚本。我已经以交互方式测试了pyspark,它可以正常工作。我在尝试创建sc时遇到此错误:

File "test.py", line 10, in <module>
    conf=(SparkConf().setMaster('local').setAppName('a').setSparkHome('/home/dirk/spark-1.4.1-bin-hadoop2.6/bin'))
  File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/conf.py", line 104, in __init__
    SparkContext._ensure_initialized()
  File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/context.py", line 229, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/java_gateway.py", line 48, in launch_gateway
    SPARK_HOME = os.environ["SPARK_HOME"]
  File "/usr/lib/python2.7/UserDict.py", line 23, in __getitem__
    raise KeyError(key)
KeyError: 'SPARK_HOME'
文件“test.py”,第10行,在
conf=(SparkConf().setMaster('local').setAppName('a').setSparkHome('/home/dirk/spark-1.4.1-bin-hadoop2.6/bin'))
文件“/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/conf.py”,第104行,在__
SparkContext.\u确保\u已初始化()
文件“/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/context.py”,第229行,在
SparkContext.\u gateway=网关或启动\u gateway()
文件“/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/java_gateway.py”,第48行,在launch_gateway中
SPARK_HOME=os.environ[“SPARK_HOME”]
文件“/usr/lib/python2.7/UserDict.py”,第23行,在__
升起钥匙错误(钥匙)
KeyError:“SPARK_HOME”

这里似乎有两个问题

第一个是您使用的路径
SPARK\u HOME
应该指向SPARK安装的根目录,因此在您的情况下,它可能应该是
/HOME/dirk/SPARK-1.4.1-bin-hadoop2.6
而不是
/HOME/dirk/SPARK-1.4.1-bin-hadoop2.6/bin

第二个问题是如何使用
setSparkHome
。如果你检查它的目标是

设置工作节点上安装Spark的路径

SparkConf
构造函数假定主控上的
SPARK\u HOME
已设置
pyspark.context.SparkContext.\u确保已初始化
pyspark.java\u网关。启动网关
SPARK\u主页
,但失败

要解决这个问题,您应该在创建
SparkConf
之前设置
SPARK\u HOME

import os
os.environ["SPARK_HOME"] = "/home/dirk/spark-1.4.1-bin-hadoop2.6"
conf = (SparkConf().setMaster('local').setAppName('a'))

如果我正在尝试连接到远程计算机,该怎么办?在尝试运行客户机时设置“SPARK_HOME”(在本例中为pyspark)并没有真正意义,是吗?这不应该被移除吗?