PySpark内核(JupyterHub)能否在客户机模式下运行?
我当前的设置:PySpark内核(JupyterHub)能否在客户机模式下运行?,pyspark,yarn,jupyterhub,spark-ec2,Pyspark,Yarn,Jupyterhub,Spark Ec2,我当前的设置: 火花EC2集群与HDFS和纱线 JuputerHub(0.7.0) 使用python27的PySpark内核 我在回答这个问题时使用的非常简单的代码: rdd = sc.parallelize([1, 2]) rdd.collect() 在Spark standalone中正常工作的PySpark内核在内核json文件中具有以下环境变量: "PYSPARK_SUBMIT_ARGS": "--master spark://<spark_master>:7077 p
- 火花EC2集群与HDFS和纱线
- JuputerHub(0.7.0)
- 使用python27的PySpark内核
rdd = sc.parallelize([1, 2])
rdd.collect()
在Spark standalone中正常工作的PySpark内核在内核json文件中具有以下环境变量:
"PYSPARK_SUBMIT_ARGS": "--master spark://<spark_master>:7077 pyspark-shell"
如上所述,我添加了HADOOP\u CONF\u DIRenv。变量,指向Hadoop配置所在的目录,并将PYSPARK\u SUBMIT\u ARGS--master
属性更改为“yarn client”。此外,我可以确认在此期间没有其他作业在运行,并且工人已正确注册
我的印象是,可以将带有PySpark内核的JupyterHub笔记本配置为与Thread一起运行,因为如果这确实是我做错的事情?为了让PySpark在Thread模式下工作,您必须进行一些额外的配置:
/hadoop-/share/hadoop/warn/
中纱线集群的hadoop纱线服务器web proxy-.jar
(您需要一个本地hadoop)/spark-/conf/
/hadoop-/hadoop-/etc/hadoop/
export HADOOP\u HOME=/HADOOP-
export SPARK\u HOME=/SPARK-
export HADOOP\u CONF\u DIR=/HADOOP-/etc/HADOOP
导出纱线\u CONF\u DIR=/hadoop-/etc/hadoop
vim/usr/local/share/jupyter/kernels/pyspark/kernel.json
{
“显示名称”:“pySpark(Spark 2.1.0)”,
“语言”:“python”,
“argv”:[
“/opt/conda/envs/python35/bin/python”,
“-m”,
“ipykernel”,
“-f”,
“{connection_file}”
],
“环境”:{
“PYSPARK_PYTHON”:“/opt/conda/envs/python35/bin/PYTHON”,
“SPARK_HOME”:“/opt/mapr/SPARK/SPARK-2.1.0”,
“PYTHONPATH”:“/opt/mapr/spark/spark-2.1.0/python/lib/py4j-0.10.4-src.zip:/opt/mapr/spark/spark-2.1.0/python/”,
“PYTHONSTARTUP”:“/opt/mapr/spark/spark-2.1.0/python/pyspark/shell.py”,
“PYSPARK_SUBMIT_ARGS”:--“主纱线PYSPARK外壳”
}
}
为了让pyspark在纱线模式下工作,您必须进行一些额外的配置:
/hadoop-/share/hadoop/warn/
中纱线集群的hadoop纱线服务器web proxy-.jar
(您需要一个本地hadoop)/spark-/conf/
/hadoop-/hadoop-/etc/hadoop/
export HADOOP\u HOME=/HADOOP-
export SPARK\u HOME=/SPARK-
export HADOOP\u CONF\u DIR=/HADOOP-/etc/HADOOP
导出纱线\u CONF\u DIR=/hadoop-/etc/hadoop
vim/usr/local/share/jupyter/kernels/pyspark/kernel.json
{
“显示名称”:“pySpark(Spark 2.1.0)”,
“语言”:“python”,
“argv”:[
“/opt/conda/envs/python35/bin/python”,
“-m”,
“ipykernel”,
“-f”,
“{connection_file}”
],
“环境”:{
“PYSPARK_PYTHON”:“/opt/conda/envs/python35/bin/PYTHON”,
“SPARK_HOME”:“/opt/mapr/SPARK/SPARK-2.1.0”,
“PYTHONPATH”:“/opt/mapr/spark/spark-2.1.0/python/lib/py4j-0.10.4-src.zip:/opt/mapr/spark/spark-2.1.0/python/”,
“PYTHONSTARTUP”:“/opt/mapr/spark/spark-2.1.0/python/pyspark/shell.py”,
“PYSPARK_SUBMIT_ARGS”:--“主纱线PYSPARK外壳”
}
}
import findspark
findspark.init()
from pyspark import SparkContext
sc = SparkContext("yarn-clinet", "First App")
我希望能帮助你
我通过简单地传递一个参数来配置url:
import findspark
findspark.init()
from pyspark import SparkContext
sc = SparkContext("yarn-clinet", "First App")