Python 我无法执行数据帧。请在Jupyter笔记本中的PySpark上执行(n)。为什么?

Python 我无法执行数据帧。请在Jupyter笔记本中的PySpark上执行(n)。为什么?,python,pyspark,jupyter-notebook,environment-variables,py4j,Python,Pyspark,Jupyter Notebook,Environment Variables,Py4j,注意:不是所有的数据帧,而是特定的数据帧(参见下面的代码) 我可以从jupyter开始Spark课程。我甚至可以使用sqlContext.createDataFrame创建数据帧,但每当我尝试在数据帧上执行操作,如.take或.head,它都会失败,并出现如下错误: Py4JJavaError: An error occurred while calling o212.collectToPython. : org.apache.spark.SparkException: Job aborted

注意:不是所有的数据帧,而是特定的数据帧(参见下面的代码)

我可以从jupyter开始Spark课程。我甚至可以使用
sqlContext.createDataFrame
创建数据帧,但每当我尝试在数据帧上执行操作,如
.take
.head
,它都会失败,并出现如下错误:

Py4JJavaError: An error occurred while calling o212.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 5, localhost, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "C:\Users\khnajm\Spark\spark-2.4.4-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py", line 377, in main
  File "C:\Users\khnajm\Spark\spark-2.4.4-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py", line 372, in process
  File "C:\Users\khnajm\Spark\spark-2.4.4-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\serializers.py", line 393, in dump_stream
My
JAVA\u HOME
SPARK\u HOME
HADOOP\u HOME
PATH
环境变量都设置正确。我还下载了winutils.exe并保存在
%SPARK\u HOME%\bin

我正在使用winutils for hadoop2.7运行jdk1.8.0ç和spark-2.4.4-bin-hadoop2.7。我试过设置
PYSPARK\u PYTHON
PYSPARK\u驱动程序\u PYTHON
PYSPARK\u驱动程序\u PYTHON\u选项
,但它们似乎没有改变任何东西

我可以在一个更简单的数据帧上运行
.take
,但是当我在下面的
字数据帧上运行它时,它失败了,尽管它是DataFrame类型

#Tokenize the text in the text column
tokenizer = Tokenizer(inputCol="SMS", outputCol="words")
wordsDataFrame = tokenizer.transform(data_df)
#remove 20 most occuring documents, documents with non numeric characters, and documents with <= 3 characters
cv_tmp = CountVectorizer(inputCol="words", outputCol="tmp_vectors")
cv_tmp_model = cv_tmp.fit(wordsDataFrame)
#标记文本列中的文本
标记器=标记器(inputCol=“SMS”,outputCol=“words”)
wordsDataFrame=tokenizer.transform(数据)
#删除20个最常见的文档、具有非数字字符的文档和具有