我们可以从pyspark shell外部运行pyspark python脚本吗?
我的pyspark脚本是它包含的m.py我们可以从pyspark shell外部运行pyspark python脚本吗?,python,apache-spark,pyspark,Python,Apache Spark,Pyspark,我的pyspark脚本是它包含的m.py l = [1,2,3,4,7,5,6,7,8,9,0] k = sc.parallelize(l) type(k) 当我提交m.py时 SPARK_MAJOR_VERSION is set to 2, using Spark2 Traceback (most recent call last): File "/root/m.py", line 3, in <module> k = sc.parallelize(l)
l = [1,2,3,4,7,5,6,7,8,9,0]
k = sc.parallelize(l)
type(k)
当我提交m.py时
SPARK_MAJOR_VERSION is set to 2, using Spark2
Traceback (most recent call last):
File "/root/m.py", line 3, in <module>
k = sc.parallelize(l)
NameError: name 'sc' is not defined
错误再次出现:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "m.py", line 3, in <module>
k = sc.parallelize(l)
NameError: name 'sc' is not defined
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
文件“m.py”,第3行,在
k=sc.并行化(l)
NameError:未定义名称“sc”
是的,可以,但必须确保正确的PYTHONPATH
并初始化所有要使用的对象:
from pyspark import SparkContext
sc = SparkContext.getOrCreate()
在驱动程序中,确保首先创建一个sparkContext变量。正如我所看到的,您直接使用了“sc”,而没有初始化它。然后您可以运行您的程序:
from pyspark import SparkContext
sc = SparkContext.getOrCreate()
import m.py
from pyspark import SparkContext
sc = SparkContext.getOrCreate()
import m.py