Macos 在Pycharm上运行Pyspark

Macos 在Pycharm上运行Pyspark,macos,pyspark,pycharm,Macos,Pyspark,Pycharm,在Mac(10.14.5版)上,我试图在PyCharm(专业版,19.2版)中运行PySpark程序 我知道我的简单PySpark程序很好,因为当我从终端使用spark submitoutsidePyCharm运行它时,使用我通过brew安装的spark,它可以正常工作。我曾尝试将PyCharm链接到此版本的Spark,但遇到了其他问题 我在线按照多个说明在Pycharm(首选项->项目解释器)中安装pyspark,并将SPARK\u HOME环境变量设置到相应的venv目录(运行->编辑配置-

在Mac(10.14.5版)上,我试图在PyCharm(专业版,19.2版)中运行PySpark程序

我知道我的简单PySpark程序很好,因为当我从终端使用
spark submit
outsidePyCharm运行它时,使用我通过
brew
安装的spark,它可以正常工作。我曾尝试将PyCharm链接到此版本的Spark,但遇到了其他问题

我在线按照多个说明在Pycharm(首选项->项目解释器)中安装pyspark,并将
SPARK\u HOME
环境变量设置到相应的venv目录(运行->编辑配置->环境变量)。比如这个。 但是,我在运行程序时收到一条错误消息:

Failed to find Spark jars directory (/Users/rahul/PycharmProjects/spark-demoII/venv/assembly/target/scala-2.12/jars).
You need to build Spark with the target "package" before running this program.
Traceback (most recent call last):
  File "/Users/rahul/PycharmProjects/spark-demoII/run.py", line 6, in <module>
    sc = SparkContext("local", "SimpleApp")
  File "/Users/rahul/virtualenvs/pyspark/lib/python3.7/site-packages/pyspark/context.py", line 133, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  File "/Users/rahul/virtualenvs/pyspark/lib/python3.7/site-packages/pyspark/context.py", line 316, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "/Users/rahul/virtualenvs/pyspark/lib/python3.7/site-packages/pyspark/java_gateway.py", line 46, in launch_gateway
    return _launch_gateway(conf)
  File "/Users/rahul/virtualenvs/pyspark/lib/python3.7/site-packages/pyspark/java_gateway.py", line 108, in _launch_gateway
    raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number

Process finished with exit code 1


因此,发生这种情况的原因是pyspark尚未更新为使用最新版本的Java。删除Java版本13后,我确保我的spark家用brew安装使用Java版本1.8。然后将以下内容添加到Pycharm中运行->编辑配置的环境变量中:

SPARK_HOME=/usr/local/cillar/apachespark/2.4.4/libexec


通过这些设置,我可以在PyCharm中运行pyspark作业

试试这个:对,这是我遵循的方向的链接之一,但在我的机器上不起作用。无论如何,谢谢-我已经更新了问题以显示此链接。我也在使用pycharm,但我不需要为相同的设置环境变量。我将
项目解释器设置为我的virtualenv-python。只需一个
pip安装pyspark
。我的zshrc中有我的
SPARK_HOME
environment变量,它指向我用tar-ball进行的手动安装。我以前试过,但SPARK的版本确实有效。不管怎样,我还是再试了一次:切换到虚拟环境后,我进行了一次
pip安装pyspark
。为了确保这个版本的spark正常工作,我运行了一个
spark submit run.py
(在PyCharm之外),错误消息显示在主注释中。
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/rahul/.virtualenvs/test1/lib/python3.7/site-packages/pyspark/jars/spark-unsafe_2.11-2.4.4.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
    at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
    at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
    at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
    at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791)
    at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2422)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2422)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2422)
    at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:79)
    at org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:348)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$secMgr$1(SparkSubmit.scala:348)
    at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:356)
    at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:356)
    at scala.Option.map(Option.scala:146)
    at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:355)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:774)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
    at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3720)
    at java.base/java.lang.String.substring(String.java:1909)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:52)
    ... 25 more