Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/290.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Databricks connect&;皮查姆;远程SSH连接_Python_Ssh_Pycharm_Databricks Connect - Fatal编程技术网

Python Databricks connect&;皮查姆;远程SSH连接

Python Databricks connect&;皮查姆;远程SSH连接,python,ssh,pycharm,databricks-connect,Python,Ssh,Pycharm,Databricks Connect,嘿,飞越者 我遇到了一个问题 我已将PyCharm设置为通过SSH连接与(azure)VM连接 因此,首先我对ssh连接进行配置 我设置了映射 我通过在虚拟机中启动一个终端来创建一个conda环境,然后下载并连接到databricks connect。我在终端上测试了一下,效果很好 我在pycharm配置上设置了控制台 但是,当我尝试运行spark会话(spark=SparkSession.builder.getOrCreate())时,databricks connect会在错误的

嘿,飞越者

我遇到了一个问题

我已将PyCharm设置为通过SSH连接与(azure)VM连接

  • 因此,首先我对ssh连接进行配置

  • 我设置了映射

  • 我通过在虚拟机中启动一个终端来创建一个conda环境,然后下载并连接到databricks connect。我在终端上测试了一下,效果很好

  • 我在pycharm配置上设置了控制台

  • 但是,当我尝试运行spark会话(spark=SparkSession.builder.getOrCreate())时,databricks connect会在错误的文件夹中搜索.databricks connect文件,并给出以下错误:

    原因:java.lang.RuntimeException:Config file/root/.databricks connect未找到。请运行
    databricks connect configure
    以接受最终用户许可协议并配置databricks connect。以下提供了EULA的副本:版权所有(2018)Databricks,Inc.

    完整错误+一些警告

    20/07/10 17:23:05 WARN Utils: Your hostname, george resolves to a loopback address: 127.0.0.1; using 10.0.0.4 instead (on interface eth0)
    20/07/10 17:23:05 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
    20/07/10 17:23:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    
    Traceback (most recent call last):
      File "/anaconda/envs/py37/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
        exec(code_obj, self.user_global_ns, self.user_ns)
      File "<ipython-input-2-23fe18298795>", line 1, in <module>
        runfile('/home/azureuser/code/model/check_vm.py')
      File "/home/azureuser/.pycharm_helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
        pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
      File "/home/azureuser/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
        exec(compile(contents+"\n", file, 'exec'), glob, loc)
      File "/home/azureuser/code/model/check_vm.py", line 13, in <module>
        spark = SparkSession.builder.getOrCreate()
      File "/anaconda/envs/py37/lib/python3.7/site-packages/pyspark/sql/session.py", line 185, in getOrCreate
        sc = SparkContext.getOrCreate(sparkConf)
      File "/anaconda/envs/py37/lib/python3.7/site-packages/pyspark/context.py", line 373, in getOrCreate
        SparkContext(conf=conf or SparkConf())
      File "/anaconda/envs/py37/lib/python3.7/site-packages/pyspark/context.py", line 137, in __init__
        conf, jsc, profiler_cls)
      File "/anaconda/envs/py37/lib/python3.7/site-packages/pyspark/context.py", line 199, in _do_init
        self._jsc = jsc or self._initialize_context(self._conf._jconf)
      File "/anaconda/envs/py37/lib/python3.7/site-packages/pyspark/context.py", line 312, in _initialize_context
        return self._jvm.JavaSparkContext(jconf)
      File "/anaconda/envs/py37/lib/python3.7/site-packages/py4j/java_gateway.py", line 1525, in __call__
        answer, self._gateway_client, None, self._fqn)
      File "/anaconda/envs/py37/lib/python3.7/site-packages/py4j/protocol.py", line 328, in get_return_value
        format(target_id, ".", name), value)
    py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
    : java.lang.ExceptionInInitializerError
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:99)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
        at py4j.Gateway.invoke(Gateway.java:250)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:251)
        at java.lang.Thread.run(Thread.java:748)
    Caused by: java.lang.RuntimeException: Config file /root/.databricks-connect not found. Please run `databricks-connect configure` to accept the end user license agreement and configure Databricks Connect. A copy of the EULA is provided below: Copyright (2018) Databricks, Inc.
    This library (the "Software") may not be used except in connection with the Licensee's use of the Databricks Platform Services pursuant to an Agreement (defined below) between Licensee (defined below) and Databricks, Inc. ("Databricks"). This Software shall be deemed part of the “Subscription Services” under the Agreement, or if the Agreement does not define Subscription Services, then the term in such Agreement that refers to the applicable Databricks Platform Services (as defined below) shall be substituted herein for “Subscription Services.”  Licensee's use of the Software must comply at all times with any restrictions applicable to the Subscription Services, generally, and must be used in accordance with any applicable documentation. If you have not agreed to an Agreement or otherwise do not agree to these terms, you may not use the Software.  This license terminates automatically upon the termination of the Agreement or Licensee's breach of these terms.
    Agreement: the agreement between Databricks and Licensee governing the use of the Databricks Platform Services, which shall be, with respect to Databricks, the Databricks Terms of Service located at www.databricks.com/termsofservice, and with respect to Databricks Community Edition, the Community Edition Terms of Service located at www.databricks.com/ce-termsofuse, in each case unless Licensee has entered into a separate written agreement with Databricks governing the use of the applicable Databricks Platform Services. Databricks Platform Services: the Databricks services or the Databricks Community Edition services, according to where the Software is used.
    Licensee: the user of the Software, or, if the Software is being used on behalf of a company, the company.
    To accept this agreement and start using Databricks Connect, run `databricks-connect configure` in a shell.
        at com.databricks.spark.util.DatabricksConnectConf$.checkEula(DatabricksConnectConf.scala:41)
        at org.apache.spark.SparkContext$.<init>(SparkContext.scala:2679)
        at org.apache.spark.SparkContext$.<clinit>(SparkContext.scala)
        ... 13 more
    
    似乎是关于如何做你想做的事情的官方教程(即databricks connect)

    很可能是.databricks connect文件的版本错误

    您需要使用Java 8而不是11、Databricks Runtime 5.5 LTS或Databricks Runtime 6.1-6.6,并且您的python版本两端都应该相同

    以下是他们给出的步骤:

    conda create --name dbconnect python=3.5
    pip uninstall pyspark
    pip install -U databricks-connect==5.5.*  # or 6.*.* to match your cluster version. 6.1-6.6 are supported
    
    然后您需要url、令牌、集群id、组织id和端口。最后在终端上运行此命令:

    databricks-connect configure
    databricks-connect test
    

    在那之后还有很多事情要做,但这应该很有希望。请记住,您需要确保您使用的所有程序都是兼容的。完成所有设置后,请尝试设置ide(pycharm)以使其正常工作。

    从错误中,我发现您需要接受Datatricks的条款和条件,然后按照pycharm ide的说明进行操作

    • CLI

      许可证显示:

      复制到剪贴簿复制版权(2018)Databricks,Inc

      本库(“软件”)不得使用,除非与 被许可方使用Databricks平台服务时 达成协议

      接受许可证并提供配置值

      Do you accept the above agreement? [y/N] y
      
      设置新的配置值(保留输入为空以接受默认值): Databricks主机[无当前值,必须以https://]开头: Databricks令牌[无当前值]: 集群ID(例如,0921-001415-628)[无电流 value]:组织ID(仅限Azure,请参见URL中的?o=orgId)[0]: 端口[15001]:

    • Databricks Connect配置脚本自动添加 打包到您的项目配置中

      Python 3集群进入运行>编辑配置

      添加PYSPARK\u PYTHON=python3作为环境变量

      Python3集群配置


    最后,您是否成功地在Databrick上设置了远程Pycharm ssh解释器。我目前正在评估Databricks是否可以为我正在从事的项目完成这项工作


    据我所知,
    databricks connect
    只对在远程机器上启动Spark作业有帮助,而其余的非Spark代码是在本地执行的…

    我猜它是在
    /root/
    目录中查找的,因为您是以root用户身份运行的。您是否尝试以普通用户身份运行该命令?它是否在同一个目录中?您能够在虚拟环境中安装
    databricks connect
    吗?我已经完成了本教程的学习,在本地机器上运行良好。当我在终端的VM中执行此操作时,它也可以正常工作。当我想使用Pycharm&ssh连接时,问题就出现了。你可以用另一个ide(比如spyder)试试,看看错误是否会重复吗?我试过VScode,效果很好。因此,它必须与Pycharm中的配置有关。很可能您是正确的,并且Pycharm配置有问题,或者Pycharm在连接databricks时仍然存在一些错误。也许你应该和JetBrains联系一下,谢谢你的回答。不幸的是,我已经遵循了这些步骤,虽然它在终端中工作正常,但在PyCharmI中不起作用。我猜一些配置文件可能会干扰,您可以在删除所有配置的情况下卸载它,然后重新安装以恢复默认值,然后重试。
    databricks-connect configure
    
    Do you accept the above agreement? [y/N] y