Anaconda pyspark——错误只出现在IPython中,而不会出现在vanila python中
如果我通过在控制台中键入Anaconda pyspark——错误只出现在IPython中,而不会出现在vanila python中,anaconda,pyspark,pyspark-sql,Anaconda,Pyspark,Pyspark Sql,如果我通过在控制台中键入/usr/bin/pyspark来启动pyspark,那么下面的示例代码运行时不会出现任何错误。但是,如果我将它与IPython一起使用,则 $IPYTHON_OPTS="notebook" /usr/bin/pyspark # notebook 或者 $IPYTHON=1 /usr/bin/pyspark 然后引发一个异常 代码如下: from pyspark import SparkContext,SparkConf from pyspark import SQL
/usr/bin/pyspark
来启动pyspark,那么下面的示例代码运行时不会出现任何错误。但是,如果我将它与IPython一起使用,则
$IPYTHON_OPTS="notebook" /usr/bin/pyspark # notebook
或者
$IPYTHON=1 /usr/bin/pyspark
然后引发一个异常
代码如下:
from pyspark import SparkContext,SparkConf
from pyspark import SQLContext
from pyspark.sql.types import *
# sc is a SparkContex object created when pyspark is invoked
sqc = SQLContext(sc)
这是错误消息:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-1-f0bbbc9cdb50> in <module>()
3 from pyspark.sql.types import *
4 # sc is a SparkContex object created when pyspark is invoked
----> 5 sqc = SQLContext(sc)
/opt/cloudera/parcels/CDH-5.4.2-1.cdh5.4.2.p0.2/lib/spark/python/pyspark/sql/context.py in __init__(self, sparkContext, sqlContext)
91 """
92 self._sc = sparkContext
---> 93 self._jsc = self._sc._jsc
94 self._jvm = self._sc._jvm
95 self._scala_SQLContext = sqlContext
AttributeError: 'module' object has no attribute '_jsc'
但是,如果我禁用anaconda发行版并使用系统附带的Python,那么一切都会正常工作
$ ipython --version
4.0.0
$ python --version
Python 2.7.3
$ cat /etc/issue
Debian GNU/Linux 7 \n \l
因此,问题出在Anaconda上,但仍然不知道问题出在哪里,不确定具体的错误,因为香草和Anaconda spark应该有相同的问题,但是,您可以检查以下几点: 确保驱动程序和工作程序上安装了相同的python版本。不同版本可能会导致序列化问题 IPYTHON_OPTS通常不推荐使用。相反,我定义了以下环境变量:
# tells pyspark to use notebook
export PYSPARK_DRIVER_PYTHON_OPS="notebook"
# tells pyspark to use the jupyter executable instead of python. In your case you might want this to be ipython instead
export PYSPARK_DRIVER_PYTHON=/opt/anaconda2/bin/jupyter
# tells pyspark where the python executable is on the executors. It MUST be the same version of python (preferably with the same packages if you are using them in a UDF or similar
export PYSPARK_PYTHON=/opt/anaconda2/bin/python
当然,我看到您没有将主控添加到命令行中,因此如果您没有更改spark默认值(即没有工作程序),这将在本地运行spark。另一个软件包也有同样的问题。很烦人。你有没有在某处发布过一个问题?
# tells pyspark to use notebook
export PYSPARK_DRIVER_PYTHON_OPS="notebook"
# tells pyspark to use the jupyter executable instead of python. In your case you might want this to be ipython instead
export PYSPARK_DRIVER_PYTHON=/opt/anaconda2/bin/jupyter
# tells pyspark where the python executable is on the executors. It MUST be the same version of python (preferably with the same packages if you are using them in a UDF or similar
export PYSPARK_PYTHON=/opt/anaconda2/bin/python