Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
SparkException:Python工作程序在执行spark操作时无法连接回_Python_Apache Spark_Pyspark - Fatal编程技术网

SparkException:Python工作程序在执行spark操作时无法连接回

SparkException:Python工作程序在执行spark操作时无法连接回,python,apache-spark,pyspark,Python,Apache Spark,Pyspark,当我试图在pyspark上执行此命令行时 arquivo = sc.textFile("dataset_analise_sentimento.csv") 我收到以下错误消息: Py4JJavaError: An error occurred while calling z: org.apache.spark.api.python.PythonRDD.runJob.: org.apache.spark.SparkException: Job aborted due to stage failu

当我试图在pyspark上执行此命令行时

arquivo = sc.textFile("dataset_analise_sentimento.csv")
我收到以下错误消息:

Py4JJavaError: An error occurred while calling z:
org.apache.spark.api.python.PythonRDD.runJob.: 
org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 0 in stage 0.0 failed 1 times, most recent failure:
Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver):
org.apache.spark.SparkException: Python worker failed to connect back.
我尝试了以下步骤:

  • 检查环境变量
  • 检查Windows 10步骤上的Apache Spark安装
  • 使用不同版本的ApacheSpark(试用过2.4.3/2.4.2/2.3.4)
  • 禁用我安装的防火墙窗口和防病毒软件
  • 尝试使用
    sc=spark.SparkContext
    手动初始化SparkContext(在Stackoverflow中找到此可能的解决方案,对我无效)
  • 尝试将
    PYSPARK\u DRIVER\u PYTHON
    的值从
    jupyter
    更改为
    ipython
    ,如本文所述,未成功
以上步骤对我都不起作用,我也找不到解决办法

实际上,我正在使用以下版本:


Python 3.7.3、Java JDK 11.0.6、Windows 10、Apache Spark 2.3.4

我只是在环境中配置了以下变量,现在它工作正常:

  • HADOOP\u HOME=C:\HADOOP
  • JAVA\u HOME=C:\JAVA\jdk-11.0.6
  • PYSPARK\u DRIVER\u PYTHON=jupyter
  • PYSPARK\u DRIVER\u PYTHON\u OPTS=notebook
  • PYSPARK\u PYTHON=PYTHON
实际上,我正在使用以下版本:


Python 3.7.3、Java JDK 11.0.6、Windows 10、Apache Spark 2.4.3,并将Jupyter笔记本与pyspark一起使用。

我只配置了以下变量环境,现在它工作正常:

  • HADOOP\u HOME=C:\HADOOP
  • JAVA\u HOME=C:\JAVA\jdk-11.0.6
  • PYSPARK\u DRIVER\u PYTHON=jupyter
  • PYSPARK\u DRIVER\u PYTHON\u OPTS=notebook
  • PYSPARK\u PYTHON=PYTHON
实际上,我正在使用以下版本:

Python3.7.3、JavaJDK11.0.6、Windows10、ApacheSpark2.4.3,并使用带有pyspark的Jupyter笔记本