Python 在Intellij中运行pyspark代码
我已经按照以下步骤在intellij中设置了Python 在Intellij中运行pyspark代码,python,intellij-idea,pyspark,Python,Intellij Idea,Pyspark,我已经按照以下步骤在intellij中设置了pyspark: 以下是尝试运行的简单代码: #!/usr/bin/env python from pyspark import * def p(msg): print("%s\n" %repr(msg)) import numpy as np a = np.array([[1,2,3], [4,5,6]]) p(a) import os sc = SparkContext("local","ptest",conf=SparkConf().se
pyspark
:
以下是尝试运行的简单代码:
#!/usr/bin/env python
from pyspark import *
def p(msg): print("%s\n" %repr(msg))
import numpy as np
a = np.array([[1,2,3], [4,5,6]])
p(a)
import os
sc = SparkContext("local","ptest",conf=SparkConf().setAppName("x"))
ardd = sc.parallelize(a)
p(ardd.collect())
下面是提交代码的结果
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Error: Must specify a primary resource (JAR or Python or R file)
Run with --help for usage help or --verbose for debug output
Traceback (most recent call last):
File "/git/misc/python/ptest.py", line 14, in <module>
sc = SparkContext("local","ptest",SparkConf().setAppName("x"))
File "/shared/spark16/python/pyspark/conf.py", line 104, in __init__
SparkContext._ensure_initialized()
File "/shared/spark16/python/pyspark/context.py", line 245, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway()
File "/shared/spark16/python/pyspark/java_gateway.py", line 94, in launch_gateway
raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
pyspark
可执行文件本身已被弃用,因为Spark 1.0
。有人这样做吗?在我的例子中,其他问答的变量设置涵盖了大部分但不是所有必需的设置。我试了很多次
只有在添加:
PYSPARK_SUBMIT_ARGS = pyspark-shell
到
运行配置pyspark
终于安静下来并成功了。你能添加你的运行配置吗?@zero323-Ah-我在这里找到了缺失的链接:需要在运行配置中的pyspark\u SUBMIT\u ARGS
向pyspark shell
添加PYTHONPATH的其他设置,SPARK_HOME如另一个问题所示。我现在对此添加了一个答案。如果您有其他信息要添加,请随时添加您自己的信息。顺便说一句:我还在寻找如何在intellij python console.me中运行pyspark
。PySparkShell和SparkSubmit对我来说很好,但是当我尝试在intellij上运行它时,我得到了“异常:Java网关进程”
PYSPARK_SUBMIT_ARGS = pyspark-shell