Apache spark PySpark SparkContext名称错误';sc&x27;在jupyter
我是pyspark的新手,想在我的Ubuntu 12.04机器上使用pyspark和Ipython笔记本。以下是pyspark和Ipython笔记本电脑的配置Apache spark PySpark SparkContext名称错误';sc&x27;在jupyter,apache-spark,ipython,pyspark,anaconda,jupyter-notebook,Apache Spark,Ipython,Pyspark,Anaconda,Jupyter Notebook,我是pyspark的新手,想在我的Ubuntu 12.04机器上使用pyspark和Ipython笔记本。以下是pyspark和Ipython笔记本电脑的配置 sparkuser@Ideapad:~$ echo $JAVA_HOME /usr/lib/jvm/java-8-oracle # Path for Spark sparkuser@Ideapad:~$ ls /home/sparkuser/spark/ bin CHANGES.txt data examples LICEN
sparkuser@Ideapad:~$ echo $JAVA_HOME
/usr/lib/jvm/java-8-oracle
# Path for Spark
sparkuser@Ideapad:~$ ls /home/sparkuser/spark/
bin CHANGES.txt data examples LICENSE NOTICE R RELEASE scala-2.11.6.deb
build conf ec2 lib licenses python README.md sbin spark-1.5.2-bin-hadoop2.6.tgz
我安装了Anaconda2 4.0.0和anaconda的路径:
sparkuser@Ideapad:~$ ls anaconda2/
bin conda-meta envs etc Examples imports include lib LICENSE.txt mkspecs pkgs plugins share ssl tests
为IPython创建PySpark配置文件
ipython profile create pyspark
sparkuser@Ideapad:~$ cat .bashrc
export SPARK_HOME="$HOME/spark"
export PYSPARK_SUBMIT_ARGS="--master local[2]"
# added by Anaconda2 4.0.0 installer
export PATH="/home/sparkuser/anaconda2/bin:$PATH"
创建名为~/.ipython/profile_pyspark/startup/00-pyspark-setup.py的文件:
sparkuser@Ideapad:~$ cat .ipython/profile_pyspark/startup/00-pyspark-setup.py
import os
import sys
spark_home = os.environ.get('SPARK_HOME', None)
sys.path.insert(0, spark_home + "/python")
sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.8.2.1-src.zip'))
filename = os.path.join(spark_home, 'python/pyspark/shell.py')
exec(compile(open(filename, "rb").read(), filename, 'exec'))
spark_release_file = spark_home + "/RELEASE"
if os.path.exists(spark_release_file) and "Spark 1.5.2" in open(spark_release_file).read():
pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
if not "pyspark-shell" in pyspark_submit_args:
pyspark_submit_args += " pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args
登录到pyspark终端:
sparkuser@Ideapad:~$ ~/spark/bin/pyspark
Python 2.7.11 |Anaconda 4.0.0 (64-bit)| (default, Dec 6 2015, 18:08:32)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/04/22 21:06:55 INFO SparkContext: Running Spark version 1.5.2
16/04/22 21:07:27 INFO BlockManagerMaster: Registered BlockManager
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.5.2
/_/
Using Python version 2.7.11 (default, Dec 6 2015 18:08:32)
SparkContext available as sc, HiveContext available as sqlContext.
>>> sc
<pyspark.context.SparkContext object at 0x7facb75b50d0>
>>>
在浏览器中,如果键入以下命令,则会引发NameError
In [ ]: print sc
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-2-ee8101b8fe58> in <module>()
----> 1 print sc
NameError: name 'sc' is not defined
[]中的:打印sc
---------------------------------------------------------------------------
NameError回溯(最近一次呼叫上次)
在()
---->1份打印sc
NameError:未定义名称“sc”
当我在pyspark终端中运行上述命令时,它输出所需的输出,但当我在jupyter中运行相同的命令时,它抛出上述错误
以上是pyspark和Ipython的配置设置。
如何使用jupyter配置pyspark?这里有一个解决方法,我建议您尝试不依赖
pyspark
来为您加载上下文:-
从安装findspark python包
pip install findspark
如果您使用Anaconda安装了Jupyter笔记本,请改用Anaconda提示符或终端:
$CONDA_PYTHON_EXE -m pip install findspark
然后只需导入并初始化sparkcontext:-
import findspark
findspark.init()
import os
import pyspark # import pyspark only after findspark
print(sc)
print(spark)
参考资料:这里有一个解决方法,我建议您在不依赖
pyspark
的情况下尝试为您加载上下文:-
从安装findspark python包
pip install findspark
如果您使用Anaconda安装了Jupyter笔记本,请改用Anaconda提示符或终端:
$CONDA_PYTHON_EXE -m pip install findspark
然后只需导入并初始化sparkcontext:-
import findspark
findspark.init()
import os
import pyspark # import pyspark only after findspark
print(sc)
print(spark)
参考资料:您好,您需要在终端中试用pyspark内核:
mkdir -p ~/.ipython/kernels/pyspark
nano ~/.ipython/kernels/pyspark/kernel.json
然后复制以下文本:
{ 'display_name': 'pySpark (Spark 1.6.1)',
'language': 'python',
'argv': [
'/usr/bin/python', // Your python Path
'-m', 'IPython.kernel',
'--profile=pyspark',
'-f',
'{connection_file}'
] }
和保存(ctr+X,y)
您现在应该在jupyter内核中有“pyspark”
现在,您的笔记本中已经存在sc或者其他sc(尝试在单元格中调用sc),或者尝试运行以下行:
import pyspark
conf = (pyspark.SparkConf().setAppName('test').set("spark.executor.memory", "2g").setMaster("local[2]"))
sc = pyspark.SparkContext(conf=conf)
您现在应该让您的sc运行嗨,您需要在终端中试用pyspark内核:
mkdir -p ~/.ipython/kernels/pyspark
nano ~/.ipython/kernels/pyspark/kernel.json
然后复制以下文本:
{ 'display_name': 'pySpark (Spark 1.6.1)',
'language': 'python',
'argv': [
'/usr/bin/python', // Your python Path
'-m', 'IPython.kernel',
'--profile=pyspark',
'-f',
'{connection_file}'
] }
和保存(ctr+X,y)
您现在应该在jupyter内核中有“pyspark”
现在,您的笔记本中已经存在sc或者其他sc(尝试在单元格中调用sc),或者尝试运行以下行:
import pyspark
conf = (pyspark.SparkConf().setAppName('test').set("spark.executor.memory", "2g").setMaster("local[2]"))
sc = pyspark.SparkContext(conf=conf)
您现在应该让您的sc运行简单的建议是不要使pyspark安装复杂化 对于版本>2.2,您可以执行一个简单的
pip安装pyspark
来安装pyspark软件包。此外,如果您也想安装jupyter,请为jupyter执行另一个pip安装。
pip安装pyspark
pip安装jupyter
或者,如果您想为spark使用其他版本或特定发行版,则较早的3分钟方法为:
简单的建议是不要使pyspark的安装复杂化
对于版本>2.2,您可以执行一个简单的pip安装pyspark
来安装pyspark软件包。
此外,如果您也想安装jupyter,请为jupyter执行另一个pip安装。
pip安装pyspark
pip安装jupyter
或者,如果您想为spark使用其他版本或特定发行版,则较早的3分钟方法为: