为Jupyter笔记本设置Pyspark:工作程序和驱动程序python版本不匹配?

为Jupyter笔记本设置Pyspark:工作程序和驱动程序python版本不匹配?,python,python-3.x,apache-spark,pyspark,jupyter-notebook,Python,Python 3.x,Apache Spark,Pyspark,Jupyter Notebook,我在本地设置和使用pyspark时遇到了困难 我有一个conda环境,我的jupyter笔记本与之关联。 下面是我通过终端安装pyspark后在终端中键入的内容 pip install pyspark pip install findspark which python3.6 export PYSPARK_DRIVER_PYTHON= #results from 'which python3.6' export PYSPARK_PYTHON=#results from 'which pyth

我在本地设置和使用pyspark时遇到了困难

我有一个conda环境,我的jupyter笔记本与之关联。 下面是我通过终端安装pyspark后在终端中键入的内容

pip install pyspark
pip install findspark

which python3.6

export PYSPARK_DRIVER_PYTHON= #results from 'which python3.6'
export PYSPARK_PYTHON=#results from 'which python3.6'

python --version
# result: Python 3.6.12 :: Anaconda, Inc.


java -version
# java version 1.8.0_25
# SE Runtime Environment (build 1.8.0_25-b17)

pyspark
#... spark version 3.0.1, using python version 3.7.4 (deffault)

以下是jupyter笔记本中的代码,我正在尝试使用这些代码:

import pyspark
from pyspark.sql.types import *
from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext

try:
    conf = pyspark.SparkConf().set('spark.driver.host','127.0.0.1')
    sc = pyspark.SparkContext(master='local', appName='samsApp',conf=conf)
    sqlContext = SQLContext(sc)
    print("Binding")
except ValueError:
    print("Spark session already created")

# below code from stack overflow: how to create pyspark dataframe
cSchema = StructType([StructField("WordList", ArrayType(StringType()))])
test_list = [['Hello', 'world']], [['I', 'am', 'fine']]

df = sqlContext.createDataFrame(test_list,schema=cSchema)

df.show()

上面的最后一行(df.show())产生一个错误:

Py4JJavaError: An error occurred while calling o41.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, macbook-pro, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/Users/j.doe/anaconda3/envs/package_env/lib/python3.6/site-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py", line 477, in main
    ("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 3.7 than that in driver 3.6, PySpark cannot run with different minor versions. Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
我该如何解决这个问题?我不知道如何调整worker与driver的版本。 请给出建议,我在网上没有看到对我有用的直截了当的答案。

尝试以下方法:

export PYSPARK_PYTHON=<python path>
export PYSPARK_DRIVER_PYTHON=<jupyter path>
pyspark
导出PYSPARK\u PYTHON=
导出PYSPARK_驱动程序_PYTHON=
皮斯帕克