Pyspark 创建spark数据帧时JSC为空
我正在努力学习火花,所以不要苛责。我有以下问题。我可以运行spark这样的基本示例Pyspark 创建spark数据帧时JSC为空,pyspark,Pyspark,我正在努力学习火花,所以不要苛责。我有以下问题。我可以运行spark这样的基本示例 import os os.environ['PYSPARK_PYTHON'] = '/g/scb/patil/andrejev/python36/bin/python3' import random from pyspark import SparkConf, SparkContext from pyspark.sql.types import * from pyspark.sql import * sc.sto
import os
os.environ['PYSPARK_PYTHON'] = '/g/scb/patil/andrejev/python36/bin/python3'
import random
from pyspark import SparkConf, SparkContext
from pyspark.sql.types import *
from pyspark.sql import *
sc.stop()
conf = SparkConf().setAppName('').setMaster('spark://remotehost:7789').setSparkHome('/path/to/spark-2.3.0-bin-hadoop2.7/')
sc = SparkContext(conf=conf)
num_samples = 100
def inside(p):
x, y = random.random(), random.random()
return x*x + y*y < 1
count = sc.parallelize(range(0, num_samples)).filter(inside).count()
pi = 4 * count / num_samples
print(pi)
以下是在本地计算机上设置的环境变量
SPARK_HOME': '/usr/local/spark/
PYSPARK_DRIVER_PYTHON: '/usr/bin/python3'
PYSPARK_DRIVER_PYTHON_OPTS: 'notebook'
PYSPARK_PYTHON: '/g/scb/patil/andrejev/python36/bin/python3'
PATH': '...:/usr/lib/jvm/java-8-oracle/jre/bin:/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/spark/bin'
和远程机器上的
PYSPARK_PYTHON=/g/scb/patil/andrejev/python36/bin/python3
PYSPARK_DIRVER_PYTHON=/g/scb/patil/andrejev/python36/bin/python3
最后我发现我有两个会话(一个是默认会话,一个是我创建的会话)同时运行。我明确地结束了使用会话创建数据帧的过程
sess = SparkSession(sc)
freq_signal = sess.createDataFrame([Row(a=1, intlist=[1,2,3], mapfield={"a": "b"})])
sess = SparkSession(sc)
freq_signal = sess.createDataFrame([Row(a=1, intlist=[1,2,3], mapfield={"a": "b"})])