Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/docker/9.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x Ubuntu 16.04 Docker容器中的PySpark开发环境_Python 3.x_Docker_Apache Spark_Pyspark - Fatal编程技术网

Python 3.x Ubuntu 16.04 Docker容器中的PySpark开发环境

Python 3.x Ubuntu 16.04 Docker容器中的PySpark开发环境,python-3.x,docker,apache-spark,pyspark,Python 3.x,Docker,Apache Spark,Pyspark,我正试图建立一个开发环境,在运行Ubuntu 16.04的Docker容器中使用ApacheSpark,特别是pyspark。为了在不同开发人员编写代码时保持一致的开发环境,我要求所有开发都在定义良好的Docker容器中进行 我的问题是,当我运行pyspark可执行文件时,我无法绕过以下Java错误 rmarkbio@linuxkit-025000000001:~/project$ pyspark Python 3.5.3+ (default, Nov 29 2017, 08:55:08) [

我正试图建立一个开发环境,在运行Ubuntu 16.04的Docker容器中使用ApacheSpark,特别是
pyspark
。为了在不同开发人员编写代码时保持一致的开发环境,我要求所有开发都在定义良好的Docker容器中进行

我的问题是,当我运行
pyspark
可执行文件时,我无法绕过以下Java错误

rmarkbio@linuxkit-025000000001:~/project$ pyspark
Python 3.5.3+ (default, Nov 29 2017, 08:55:08) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.spark.SparkConf$.<init>(SparkConf.scala:716)
    at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala)
    at org.apache.spark.SparkConf.$anonfun$getOption$1(SparkConf.scala:389)
    at scala.Option.orElse(Option.scala:306)
    at org.apache.spark.SparkConf.getOption(SparkConf.scala:389)
    at org.apache.spark.SparkConf.get(SparkConf.scala:251)
    at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:463)
    at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
    at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$3(SparkSubmit.scala:334)
    at scala.Option.getOrElse(Option.scala:138)
    at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: linuxkit-025000000001: linuxkit-025000000001: Name or service not known
    at java.net.InetAddress.getLocalHost(InetAddress.java:1506)
    at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:946)
    at org.apache.spark.util.Utils$.localIpAddress$lzycompute(Utils.scala:939)
    at org.apache.spark.util.Utils$.localIpAddress(Utils.scala:939)
    at org.apache.spark.util.Utils$.$anonfun$localCanonicalHostName$1(Utils.scala:996)
    at scala.Option.getOrElse(Option.scala:138)
    at org.apache.spark.util.Utils$.localCanonicalHostName(Utils.scala:996)
    at org.apache.spark.internal.config.package$.<init>(package.scala:302)
    at org.apache.spark.internal.config.package$.<clinit>(package.scala)
    ... 16 more
Caused by: java.net.UnknownHostException: linuxkit-025000000001: Name or service not known
    at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
    at java.net.InetAddress.getLocalHost(InetAddress.java:1501)
    ... 24 more
conn_info_file:  /tmp/tmpiuwhok7q/tmplief2cba
Traceback (most recent call last):
  File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/shell.py", line 38, in <module>
    SparkContext._ensure_initialized()
  File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/context.py", line 316, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/java_gateway.py", line 46, in launch_gateway
    return _launch_gateway(conf)
  File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/java_gateway.py", line 109, in _launch_gateway
    raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
>>> 
rmarkbio@linuxkit-025000000001:~/project/spark-2.4.2-bin-hadoop2.7$ ./bin/pyspark 
Python 3.5.3+ (default, Nov 29 2017, 08:55:08) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
19/05/07 18:49:58 WARN Utils: Your hostname, linuxkit-025000000001 resolves to a loopback address: 127.0.0.1; using 192.168.65.3 instead (on interface eth0)
19/05/07 18:49:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/05/07 18:49:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.2
      /_/


Using Python version 3.5.3+ (default, Nov 29 2017 08:55:08)
SparkSession available as 'spark'.
图像是由以下内容构建的

docker build -t username/image_name:v000 .
        docker run -i -t \
            --entrypoint /bin/bash \
            --net="host" \
            --name=container_name \
            -v $(PWD):/home/username/project \
            -v $(PWD)/../logs:/home/username/logs \
            -v ~/.ssh/id_rsa:/root/.ssh/id_rsa \
            username/image_name:v000
容器里有类似的东西

docker build -t username/image_name:v000 .
        docker run -i -t \
            --entrypoint /bin/bash \
            --net="host" \
            --name=container_name \
            -v $(PWD):/home/username/project \
            -v $(PWD)/../logs:/home/username/logs \
            -v ~/.ssh/id_rsa:/root/.ssh/id_rsa \
            username/image_name:v000

我觉得我已经仔细检查了Java、scala、Spark的每个版本以及它们各自的环境变量,但我无法改变这个错误。网上只有少数人提到这个错误,但没有一个是有用的。考虑到很少有人提到这个错误,但是,我认为我缺少了一些简单而明显的东西,因为很多人使用这种技术。

我找到了问题的根源,它与Docker以及它在容器中如何处理主机名有关。我还有一个令人不满意的变通办法

假设我已将火花焦油球装上
wget
并将其拆下

进入它的目录后,我可以尝试已经提供的另一个示例脚本

./bin/run-example SparkPi 10
我一直遇到这个错误:

rmarkbio@linuxkit-025000000001:~/project/spark-2.4.2-bin-hadoop2.7$ ./bin/run-example SparkPi 10
Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.spark.SparkConf$.<init>(SparkConf.scala:716)
    at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala)
    at org.apache.spark.SparkConf.$anonfun$getOption$1(SparkConf.scala:389)
    at scala.Option.orElse(Option.scala:306)
    at org.apache.spark.SparkConf.getOption(SparkConf.scala:389)
    at org.apache.spark.SparkConf.get(SparkConf.scala:251)
    at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:463)
    at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
    at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$3(SparkSubmit.scala:334)
    at scala.Option.getOrElse(Option.scala:138)
    at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: linuxkit-025000000001: linuxkit-025000000001: Name or service not known
    at java.net.InetAddress.getLocalHost(InetAddress.java:1506)
    at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:946)
    at org.apache.spark.util.Utils$.localIpAddress$lzycompute(Utils.scala:939)
    at org.apache.spark.util.Utils$.localIpAddress(Utils.scala:939)
    at org.apache.spark.util.Utils$.$anonfun$localCanonicalHostName$1(Utils.scala:996)
    at scala.Option.getOrElse(Option.scala:138)
    at org.apache.spark.util.Utils$.localCanonicalHostName(Utils.scala:996)
    at org.apache.spark.internal.config.package$.<init>(package.scala:302)
    at org.apache.spark.internal.config.package$.<clinit>(package.scala)
    ... 16 more
Caused by: java.net.UnknownHostException: linuxkit-025000000001: Name or service not known
    at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
    at java.net.InetAddress.getLocalHost(InetAddress.java:1501)
    ... 24 more
表示主机名(此处为“
linuxkit-025000001
”)不在
/etc/hosts
中。事实上,就是这样。我应该去那个档案里换衣服

127.0.0.1   localhost

而且它似乎是有效的(据我所知,从大量的输出,我将放弃张贴在这里)

为了再次检查,让我们尝试运行
pyspark
可执行文件

rmarkbio@linuxkit-025000000001:~/project$ pyspark
Python 3.5.3+ (default, Nov 29 2017, 08:55:08) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.spark.SparkConf$.<init>(SparkConf.scala:716)
    at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala)
    at org.apache.spark.SparkConf.$anonfun$getOption$1(SparkConf.scala:389)
    at scala.Option.orElse(Option.scala:306)
    at org.apache.spark.SparkConf.getOption(SparkConf.scala:389)
    at org.apache.spark.SparkConf.get(SparkConf.scala:251)
    at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:463)
    at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
    at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$3(SparkSubmit.scala:334)
    at scala.Option.getOrElse(Option.scala:138)
    at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: linuxkit-025000000001: linuxkit-025000000001: Name or service not known
    at java.net.InetAddress.getLocalHost(InetAddress.java:1506)
    at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:946)
    at org.apache.spark.util.Utils$.localIpAddress$lzycompute(Utils.scala:939)
    at org.apache.spark.util.Utils$.localIpAddress(Utils.scala:939)
    at org.apache.spark.util.Utils$.$anonfun$localCanonicalHostName$1(Utils.scala:996)
    at scala.Option.getOrElse(Option.scala:138)
    at org.apache.spark.util.Utils$.localCanonicalHostName(Utils.scala:996)
    at org.apache.spark.internal.config.package$.<init>(package.scala:302)
    at org.apache.spark.internal.config.package$.<clinit>(package.scala)
    ... 16 more
Caused by: java.net.UnknownHostException: linuxkit-025000000001: Name or service not known
    at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
    at java.net.InetAddress.getLocalHost(InetAddress.java:1501)
    ... 24 more
conn_info_file:  /tmp/tmpiuwhok7q/tmplief2cba
Traceback (most recent call last):
  File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/shell.py", line 38, in <module>
    SparkContext._ensure_initialized()
  File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/context.py", line 316, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/java_gateway.py", line 46, in launch_gateway
    return _launch_gateway(conf)
  File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/java_gateway.py", line 109, in _launch_gateway
    raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
>>> 
rmarkbio@linuxkit-025000000001:~/project/spark-2.4.2-bin-hadoop2.7$ ./bin/pyspark 
Python 3.5.3+ (default, Nov 29 2017, 08:55:08) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
19/05/07 18:49:58 WARN Utils: Your hostname, linuxkit-025000000001 resolves to a loopback address: 127.0.0.1; using 192.168.65.3 instead (on interface eth0)
19/05/07 18:49:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/05/07 18:49:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.2
      /_/


Using Python version 3.5.3+ (default, Nov 29 2017 08:55:08)
SparkSession available as 'spark'.
它工作

那么,linuxkit-025000001的历史是什么?这很好地解释了这一点

我希望在从映像创建容器后立即在
/etc/hosts
中进行此更改,因此我需要
Dockerfile
来完成此操作。不幸的是,这似乎不可能

到目前为止,我无法从
Dockerfile
中找到这样做的方法;Docker似乎被设置为在容器制作完成之前阻止访问
/etc/hosts
。相反,我必须满足于只调用
init.sh

#!/bin/bash
cp /etc/hosts /home/rmarkbio/project/hosts.new
sed -i "s/127.0.0.1       localhost/127.0.0.1   linuxkit-025000000001    localhost/" /home/rmarkbio/project/hosts.new
echo "somepassword" | sudo -S cp -f /home/rmarkbio/project/hosts.new /etc/hosts
rm /home/rmarkbio/project/hosts.new
echo ''

我找到了问题的根源,它与Docker以及它在容器中处理主机名的方式有关。我还有一个令人不满意的变通办法

假设我已将火花焦油球装上
wget
并将其拆下

进入它的目录后,我可以尝试已经提供的另一个示例脚本

./bin/run-example SparkPi 10
我一直遇到这个错误:

rmarkbio@linuxkit-025000000001:~/project/spark-2.4.2-bin-hadoop2.7$ ./bin/run-example SparkPi 10
Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.spark.SparkConf$.<init>(SparkConf.scala:716)
    at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala)
    at org.apache.spark.SparkConf.$anonfun$getOption$1(SparkConf.scala:389)
    at scala.Option.orElse(Option.scala:306)
    at org.apache.spark.SparkConf.getOption(SparkConf.scala:389)
    at org.apache.spark.SparkConf.get(SparkConf.scala:251)
    at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:463)
    at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
    at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$3(SparkSubmit.scala:334)
    at scala.Option.getOrElse(Option.scala:138)
    at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: linuxkit-025000000001: linuxkit-025000000001: Name or service not known
    at java.net.InetAddress.getLocalHost(InetAddress.java:1506)
    at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:946)
    at org.apache.spark.util.Utils$.localIpAddress$lzycompute(Utils.scala:939)
    at org.apache.spark.util.Utils$.localIpAddress(Utils.scala:939)
    at org.apache.spark.util.Utils$.$anonfun$localCanonicalHostName$1(Utils.scala:996)
    at scala.Option.getOrElse(Option.scala:138)
    at org.apache.spark.util.Utils$.localCanonicalHostName(Utils.scala:996)
    at org.apache.spark.internal.config.package$.<init>(package.scala:302)
    at org.apache.spark.internal.config.package$.<clinit>(package.scala)
    ... 16 more
Caused by: java.net.UnknownHostException: linuxkit-025000000001: Name or service not known
    at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
    at java.net.InetAddress.getLocalHost(InetAddress.java:1501)
    ... 24 more
表示主机名(此处为“
linuxkit-025000001
”)不在
/etc/hosts
中。事实上,就是这样。我应该去那个档案里换衣服

127.0.0.1   localhost

而且它似乎是有效的(据我所知,从大量的输出,我将放弃张贴在这里)

为了再次检查,让我们尝试运行
pyspark
可执行文件

rmarkbio@linuxkit-025000000001:~/project$ pyspark
Python 3.5.3+ (default, Nov 29 2017, 08:55:08) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.spark.SparkConf$.<init>(SparkConf.scala:716)
    at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala)
    at org.apache.spark.SparkConf.$anonfun$getOption$1(SparkConf.scala:389)
    at scala.Option.orElse(Option.scala:306)
    at org.apache.spark.SparkConf.getOption(SparkConf.scala:389)
    at org.apache.spark.SparkConf.get(SparkConf.scala:251)
    at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:463)
    at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
    at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$3(SparkSubmit.scala:334)
    at scala.Option.getOrElse(Option.scala:138)
    at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: linuxkit-025000000001: linuxkit-025000000001: Name or service not known
    at java.net.InetAddress.getLocalHost(InetAddress.java:1506)
    at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:946)
    at org.apache.spark.util.Utils$.localIpAddress$lzycompute(Utils.scala:939)
    at org.apache.spark.util.Utils$.localIpAddress(Utils.scala:939)
    at org.apache.spark.util.Utils$.$anonfun$localCanonicalHostName$1(Utils.scala:996)
    at scala.Option.getOrElse(Option.scala:138)
    at org.apache.spark.util.Utils$.localCanonicalHostName(Utils.scala:996)
    at org.apache.spark.internal.config.package$.<init>(package.scala:302)
    at org.apache.spark.internal.config.package$.<clinit>(package.scala)
    ... 16 more
Caused by: java.net.UnknownHostException: linuxkit-025000000001: Name or service not known
    at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
    at java.net.InetAddress.getLocalHost(InetAddress.java:1501)
    ... 24 more
conn_info_file:  /tmp/tmpiuwhok7q/tmplief2cba
Traceback (most recent call last):
  File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/shell.py", line 38, in <module>
    SparkContext._ensure_initialized()
  File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/context.py", line 316, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/java_gateway.py", line 46, in launch_gateway
    return _launch_gateway(conf)
  File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/java_gateway.py", line 109, in _launch_gateway
    raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
>>> 
rmarkbio@linuxkit-025000000001:~/project/spark-2.4.2-bin-hadoop2.7$ ./bin/pyspark 
Python 3.5.3+ (default, Nov 29 2017, 08:55:08) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
19/05/07 18:49:58 WARN Utils: Your hostname, linuxkit-025000000001 resolves to a loopback address: 127.0.0.1; using 192.168.65.3 instead (on interface eth0)
19/05/07 18:49:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/05/07 18:49:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.2
      /_/


Using Python version 3.5.3+ (default, Nov 29 2017 08:55:08)
SparkSession available as 'spark'.
它工作

那么,linuxkit-025000001的历史是什么?这很好地解释了这一点

我希望在从映像创建容器后立即在
/etc/hosts
中进行此更改,因此我需要
Dockerfile
来完成此操作。不幸的是,这似乎不可能

到目前为止,我无法从
Dockerfile
中找到这样做的方法;Docker似乎被设置为在容器制作完成之前阻止访问
/etc/hosts
。相反,我必须满足于只调用
init.sh

#!/bin/bash
cp /etc/hosts /home/rmarkbio/project/hosts.new
sed -i "s/127.0.0.1       localhost/127.0.0.1   linuxkit-025000000001    localhost/" /home/rmarkbio/project/hosts.new
echo "somepassword" | sudo -S cp -f /home/rmarkbio/project/hosts.new /etc/hosts
rm /home/rmarkbio/project/hosts.new
echo ''