Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/azure/11.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在windows和pycharm中设置SPARK-HOME路径变量_Python_Windows_Pyspark - Fatal编程技术网

Python 在windows和pycharm中设置SPARK-HOME路径变量

Python 在windows和pycharm中设置SPARK-HOME路径变量,python,windows,pyspark,Python,Windows,Pyspark,我是新的火花,并试图在windows中使用它。我能够使用hadoop的预构建版本成功下载并安装Spark 1.4.1。在以下目录中: /my/spark/directory/bin 我可以运行spark shell和pyspark.cmd,一切正常。我要处理的唯一问题是,我想在用Pycharm编码时导入pyspark。现在,我正在使用以下代码使事情正常进行: import sys import os from operator import add os.environ['SPARK_HOM

我是新的火花,并试图在windows中使用它。我能够使用hadoop的预构建版本成功下载并安装Spark 1.4.1。在以下目录中:

/my/spark/directory/bin
我可以运行spark shell和pyspark.cmd,一切正常。我要处理的唯一问题是,我想在用Pycharm编码时导入pyspark。现在,我正在使用以下代码使事情正常进行:

import sys
import os
from operator import add

os.environ['SPARK_HOME'] = "C:\spark-1.4.1-bin-hadoop2.6"
sys.path.append("C:\spark-1.4.1-bin-hadoop2.6/python")
sys.path.append("C:\spark-1.4.1-bin-hadoop2.6/python/build")

try:
    from pyspark import SparkContext
    from pyspark import SparkConf

except ImportError as e:
    print ("Error importing Spark Modules", e)
    sys.exit(1)
我想知道是否有更简单的方法来做这件事。我使用的是Windows8-Python3.4和Spark 1.4.1,这就是我通常使用的函数,如下面所示,以减少重复性

def configure_spark(spark_home=None, pyspark_python=None):
    spark_home = spark_home or "/path/to/default/spark/home"
    os.environ['SPARK_HOME'] = spark_home

    # Add the PySpark directories to the Python path:
    sys.path.insert(1, os.path.join(spark_home, 'python'))
    sys.path.insert(1, os.path.join(spark_home, 'python', 'pyspark'))
    sys.path.insert(1, os.path.join(spark_home, 'python', 'build'))

    # If PySpark isn't specified, use currently running Python binary:
    pyspark_python = pyspark_python or sys.executable
    os.environ['PYSPARK_PYTHON'] = pyspark_python
然后,您可以在导入pyspark之前调用该函数:

configure_spark('/path/to/spark/home')
from pyspark import SparkContext