Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/331.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在Jupyter笔记本中运行PypSpark-Windows_Python_Pyspark_Jupyter - Fatal编程技术网

Python 在Jupyter笔记本中运行PypSpark-Windows

Python 在Jupyter笔记本中运行PypSpark-Windows,python,pyspark,jupyter,Python,Pyspark,Jupyter,我想从Jupyter笔记本运行pySpark。我下载并安装了有Juptyer的Anaconda。我创建了以下几行 from pyspark import SparkConf, SparkContext conf = SparkConf().setMaster("local").setAppName("My App") sc = SparkContext(conf = conf) 我得到以下错误 ImportError Trace

我想从Jupyter笔记本运行pySpark。我下载并安装了有Juptyer的Anaconda。我创建了以下几行

 from pyspark import SparkConf, SparkContext
 conf = SparkConf().setMaster("local").setAppName("My App")
 sc = SparkContext(conf = conf)
我得到以下错误

ImportError                               Traceback (most recent call last)
<ipython-input-3-98c83f0bd5ff> in <module>()
  ----> 1 from pyspark import SparkConf, SparkContext
  2 conf = SparkConf().setMaster("local").setAppName("My App")
  3 sc = SparkContext(conf = conf)

 C:\software\spark\spark-1.6.2-bin-hadoop2.6\python\pyspark\__init__.py in   <module>()
 39 
 40 from pyspark.conf import SparkConf
  ---> 41 from pyspark.context import SparkContext
 42 from pyspark.rdd import RDD
 43 from pyspark.files import SparkFiles

 C:\software\spark\spark-1.6.2-bin-hadoop2.6\python\pyspark\context.py in <module>()
 26 from tempfile import NamedTemporaryFile
 27 
 ---> 28 from pyspark import accumulators
 29 from pyspark.accumulators import Accumulator
 30 from pyspark.broadcast import Broadcast

 ImportError: cannot import name accumulators
ImportError回溯(最近一次调用)
在()
---->1来自pyspark导入SparkConf,SparkContext
2 conf=SparkConf().setMaster(“本地”).setAppName(“我的应用程序”)
3 sc=SparkContext(conf=conf)
C:\software\spark\spark-1.6.2-bin-hadoop2.6\python\pyspark\ \uuuu init\uuuuuu.py in()
39
40从pyspark.conf导入SparkConf
--->41从pyspark.context导入SparkContext
42从pyspark.rdd导入rdd
43从pyspark.files导入SparkFile
C:\software\spark\spark-1.6.2-bin-hadoop2.6\python\pyspark\context.py in()
26从tempfile导入NamedTemporaryFile
27
--->28来自pyspark进口蓄能器
29来自pyspark.蓄能器进口蓄能器
30从pyspark.broadcast导入广播
ImportError:无法导入名称累加器
根据Stackoverflow中的答案,我尝试添加以下环境变量PYTHONPATH,该变量指向spark/python目录

但这没有任何帮助

这对我来说很有效:

import os
import sys

spark_path = "D:\spark"

os.environ['SPARK_HOME'] = spark_path
os.environ['HADOOP_HOME'] = spark_path

sys.path.append(spark_path + "/bin")
sys.path.append(spark_path + "/python")
sys.path.append(spark_path + "/python/pyspark/")
sys.path.append(spark_path + "/python/lib")
sys.path.append(spark_path + "/python/lib/pyspark.zip")
sys.path.append(spark_path + "/python/lib/py4j-0.9-src.zip")

from pyspark import SparkContext
from pyspark import SparkConf

sc = SparkContext("local", "test")
要验证:

In [2]: sc
Out[2]: <pyspark.context.SparkContext at 0x707ccf8>
[2]中的
:sc
出[2]:

2018版

在Windows 10上安装PYSPARK 带水蟒导航器的JUPYTER-NOTEBOOK

第一步 下载软件包

1) spark-2.2.0-bin-hadoop2.7.tgz

2) JavaJDK8版本

3) 巨蟒v 5.2

4) scala-2.12.6.msi

5) hadoop v2.7.1

步骤2 在C://驱动器中创建SPARK文件夹,并将所有内容放入其中

注意:在安装SCALA的过程中,在SPARK文件夹中给出SCALA的路径

步骤3 现在设置新的WINDOWS环境变量

  • HADOOP\u HOME=C:\spark\HADOOP

  • JAVA\u HOME=C:\Program Files\JAVA\jdk1.8.0\u 151

  • SCALA\u HOME=C:\spark\SCALA\bin

  • SPARK\u HOME=C:\SPARK\SPARK\bin

  • PYSPARK\u PYTHON=C:\Users\user\Anaconda3\PYTHON.exe

  • PYSPARK\u DRIVER\u PYTHON=C:\Users\user\Anaconda3\Scripts\jupyter.exe

  • PYSPARK\u DRIVER\u PYTHON\u OPTS=notebook

  • 现在选择火花的路径

    单击编辑并添加新内容

    将“C:\spark\spark\bin”添加到变量“Path”窗口

  • 步骤4
    • 创建一个文件夹,用于存储Jupyter笔记本输出和文件
    • 之后,打开Anaconda命令提示符和cd文件夹名称
    • 然后输入Pyspark
    就是这样,您的浏览器将弹出Juypter localhost

    步骤5 检查pyspark是否工作

    键入简单代码并运行它

    from pyspark.sql import Row
    a = Row(name = 'Vinay' , age=22 , height=165)
    print("a: ",a)
    

    在Jupyter笔记本电脑-Windows中运行PypSpark

    JAVA8:

    阿纳康达:

    jupyter的PypSpark:


    没有。我从pyspark.context导入SparkContext 42从pyspark.rdd从pyspark.files导入rdd 43从pyspark.files导入SparkFiles C:\software\spark\spark-1.6.2-bin-hadoop2.6\python\pyspark\context.py in()26从tempfile导入NamedTemporaryFile 27--->28从pyspark导入累加器29从pyspark导入累加器30从pyspark导入累加器广播导入广播导入错误:无法导入名称累加器
    import findspark
    
    findspark.init()
    
    from pyspark.sql import SparkSession
    from pyspark.sql.functions import *
    from pyspark.sql.types import *
    
    spark = SparkSession.builder.appName('test').getOrCreate()
    data = [(1, "siva", 100), (2, "siva2", 200),(3, "siva3", 300),(4, "siva4", 400),(5, "siva5", 500)]
    schema = ['id', 'name', 'sallary']
    
    df = spark.createDataFrame(data, schema=schema)
    df.show()