Ipython 齐柏林飞艇:构造函数org.apache.spark.api.PythonRDD不存在 伊皮顿笔记本

Ipython 齐柏林飞艇:构造函数org.apache.spark.api.PythonRDD不存在 伊皮顿笔记本,ipython,ipython-notebook,pyspark,py4j,apache-zeppelin,Ipython,Ipython Notebook,Pyspark,Py4j,Apache Zeppelin,根据文档开始(PYSPARK\u DRIVER\u PYTHON=ipython PYSPARK\u DRIVER\u PYTHON\u OPTS=“notebook”。/bin/PYSPARK),然后填写: from os import path from tempfile import gettempdir #from pyspark import SparkFiles filename = path.join(gettempdir(), 'somefile.txt') with o

根据文档开始(
PYSPARK\u DRIVER\u PYTHON=ipython PYSPARK\u DRIVER\u PYTHON\u OPTS=“notebook”。/bin/PYSPARK
),然后填写:

from os import path
from tempfile import gettempdir

#from pyspark import SparkFiles


filename = path.join(gettempdir(), 'somefile.txt')

with open(filename, 'w') as f:
    f.writelines(['foo\n'*500])

#sc = SparkContext(appName="PythonSort")
sc.addFile(filename)

print 'sc.textFile(filename).count() =', sc.textFile(filename).count()

sc.stop()
输出:
sc.textFile(文件名).count()=500

阿帕奇齐柏林飞艇笔记本 输出:

(,Py4JError)调用None.org.apache.spark.api.PythonRDD.Trace:\npy4j.Py4JException:Constructor org.apache.spark.api.PythonRDD([class org.apache.spark.rdd.MapPartitionsRDD,class]时发生错误[B,类java.util.HashMap、类java.util.ArrayList、类java.lang.Boolean、类java.lang.String、类java.lang.String、类java.util.ArrayList、类org.apache.spark.Accumerator])不存在\n\tat py4j.ReflectionEngine.getConstructor(ReflectionEngine.java:184)\n\tat py4j.ReflectionEngine.getConstructor(ReflectionEngine.java:202)\n\tat py4j.Gateway.invoke(Gateway.java:213)\n\tat py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:207)\n\tat java.lang.Thread.run(Thread.java:745)\n\n',),)

您可以使用IPython简化pyspark设置。谢谢,该库在Windows上很有用。
%pyspark
# Then same as "IPython notebook"