Pyspark Jupyter ImportError:尽管安装了py4j,但没有名为py4j.protocol的模块
我读了一些关于导入pyspark时我现在看到的错误的帖子,一些帖子建议导入pyspark,我已经这样做了,但我仍然看到了错误Pyspark Jupyter ImportError:尽管安装了py4j,但没有名为py4j.protocol的模块,pyspark,jupyter,conda,Pyspark,Jupyter,Conda,我读了一些关于导入pyspark时我现在看到的错误的帖子,一些帖子建议导入pyspark,我已经这样做了,但我仍然看到了错误 I am using a conda environment, here is the steps: 1. create a yml file and include the needed packages (including the py4j) 2. create a env based on the yml 3. create a kernel pointing t
I am using a conda environment, here is the steps:
1. create a yml file and include the needed packages (including the py4j)
2. create a env based on the yml
3. create a kernel pointing to the env
4. start the kernel in Jupyter
5. running `import pyspark` throws error: ImportError: No module named py4j.protocol
该问题通过在kernel.json中添加environment部分解决,并明确指定以下变量:
"env": {
"HADOOP_CONF_DIR": "/etc/spark2/conf/yarn-conf",
"PYSPARK_PYTHON":"/opt/cloudera/parcels/Anaconda/bin/python",
"SPARK_HOME": "/opt/cloudera/parcels/SPARK2",
"PYTHONPATH": "/opt/cloudera/parcels/SPARK2/lib/spark2/python/lib/py4j-0.10.7-src.zip:/opt/cloudera/parcels/SPARK2/lib/spark2/python/",
"PYTHONSTARTUP": "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/shell.py",
"PYSPARK_SUBMIT_ARGS": " --master yarn --deploy-mode client pyspark-shell"
}
你把SPARK_加回家了吗?是的,谢谢。