SparkException:Python工作程序在执行spark操作时无法连接回
当我试图在pyspark上执行此命令行时SparkException:Python工作程序在执行spark操作时无法连接回,python,apache-spark,pyspark,Python,Apache Spark,Pyspark,当我试图在pyspark上执行此命令行时 arquivo = sc.textFile("dataset_analise_sentimento.csv") 我收到以下错误消息: Py4JJavaError: An error occurred while calling z: org.apache.spark.api.python.PythonRDD.runJob.: org.apache.spark.SparkException: Job aborted due to stage failu
arquivo = sc.textFile("dataset_analise_sentimento.csv")
我收到以下错误消息:
Py4JJavaError: An error occurred while calling z:
org.apache.spark.api.python.PythonRDD.runJob.:
org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 0.0 failed 1 times, most recent failure:
Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver):
org.apache.spark.SparkException: Python worker failed to connect back.
我尝试了以下步骤:
- 检查环境变量
- 检查Windows 10步骤上的Apache Spark安装
- 使用不同版本的ApacheSpark(试用过2.4.3/2.4.2/2.3.4)
- 禁用我安装的防火墙窗口和防病毒软件
- 尝试使用
手动初始化SparkContext(在Stackoverflow中找到此可能的解决方案,对我无效)sc=spark.SparkContext
- 尝试将
的值从PYSPARK\u DRIVER\u PYTHON
更改为jupyter
,如本文所述,未成功ipython
Python 3.7.3、Java JDK 11.0.6、Windows 10、Apache Spark 2.3.4我只是在环境中配置了以下变量,现在它工作正常:
- HADOOP\u HOME=C:\HADOOP
JAVA\u HOME=C:\JAVA\jdk-11.0.6
PYSPARK\u DRIVER\u PYTHON=jupyter
PYSPARK\u DRIVER\u PYTHON\u OPTS=notebook
PYSPARK\u PYTHON=PYTHON
Python 3.7.3、Java JDK 11.0.6、Windows 10、Apache Spark 2.4.3,并将Jupyter笔记本与pyspark一起使用。我只配置了以下变量环境,现在它工作正常:
- HADOOP\u HOME=C:\HADOOP
JAVA\u HOME=C:\JAVA\jdk-11.0.6
PYSPARK\u DRIVER\u PYTHON=jupyter
PYSPARK\u DRIVER\u PYTHON\u OPTS=notebook
PYSPARK\u PYTHON=PYTHON