如何在jupyter笔记本中设置pyspark默认上下文?

如何在jupyter笔记本中设置pyspark默认上下文?,pyspark,jupyter-notebook,Pyspark,Jupyter Notebook,当我启动pyspark安装程序时,它会创建一个Jupyter笔记本,我可以在网上愉快地访问它。它还自动创建诸如“sc”和“spark”上下文之类的对象。在哪里可以覆盖这些对象的初始化方式?在jupyter中启动纯python内核。然后为spark和pyspark添加环境变量,并使用pyspark libs预结束sys.path,例如: import os, sys os.environ['SPARK_HOME'] = '/home/mario/spark-2.1.0-bin-hadoop2.7'

当我启动pyspark安装程序时,它会创建一个Jupyter笔记本,我可以在网上愉快地访问它。它还自动创建诸如“sc”和“spark”上下文之类的对象。在哪里可以覆盖这些对象的初始化方式?

在jupyter中启动纯python内核。然后为spark和pyspark添加环境变量,并使用pyspark libs预结束
sys.path
,例如:

import os, sys
os.environ['SPARK_HOME'] = '/home/mario/spark-2.1.0-bin-hadoop2.7'
os.environ['PYSPARK_SUBMIT_ARGS'] = "--master local[2] --driver-memory 2g pyspark-shell"
os.environ['PYSPARK_PYTHON'] = '/usr/bin/python3'
sys.path.insert(0, '/home/mario/spark-2.1.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip')
sys.path.insert(0, '/home/mario/spark-2.1.0-bin-hadoop2.7/python')
from pyspark.sql.session import SparkSession
spark = (SparkSession.builder
    .appName('picapica')
    .config('spark.speculation', 'true')
    .getOrCreate()) 
然后,您可以在jupyter单元内自定义spark初始化,例如:

import os, sys
os.environ['SPARK_HOME'] = '/home/mario/spark-2.1.0-bin-hadoop2.7'
os.environ['PYSPARK_SUBMIT_ARGS'] = "--master local[2] --driver-memory 2g pyspark-shell"
os.environ['PYSPARK_PYTHON'] = '/usr/bin/python3'
sys.path.insert(0, '/home/mario/spark-2.1.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip')
sys.path.insert(0, '/home/mario/spark-2.1.0-bin-hadoop2.7/python')
from pyspark.sql.session import SparkSession
spark = (SparkSession.builder
    .appName('picapica')
    .config('spark.speculation', 'true')
    .getOrCreate())