Python 3.x Ubuntu 16.04 Docker容器中的PySpark开发环境
我正试图建立一个开发环境,在运行Ubuntu 16.04的Docker容器中使用ApacheSpark,特别是Python 3.x Ubuntu 16.04 Docker容器中的PySpark开发环境,python-3.x,docker,apache-spark,pyspark,Python 3.x,Docker,Apache Spark,Pyspark,我正试图建立一个开发环境,在运行Ubuntu 16.04的Docker容器中使用ApacheSpark,特别是pyspark。为了在不同开发人员编写代码时保持一致的开发环境,我要求所有开发都在定义良好的Docker容器中进行 我的问题是,当我运行pyspark可执行文件时,我无法绕过以下Java错误 rmarkbio@linuxkit-025000000001:~/project$ pyspark Python 3.5.3+ (default, Nov 29 2017, 08:55:08) [
pyspark
。为了在不同开发人员编写代码时保持一致的开发环境,我要求所有开发都在定义良好的Docker容器中进行
我的问题是,当我运行pyspark
可执行文件时,我无法绕过以下Java错误
rmarkbio@linuxkit-025000000001:~/project$ pyspark
Python 3.5.3+ (default, Nov 29 2017, 08:55:08)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.spark.SparkConf$.<init>(SparkConf.scala:716)
at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala)
at org.apache.spark.SparkConf.$anonfun$getOption$1(SparkConf.scala:389)
at scala.Option.orElse(Option.scala:306)
at org.apache.spark.SparkConf.getOption(SparkConf.scala:389)
at org.apache.spark.SparkConf.get(SparkConf.scala:251)
at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:463)
at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$3(SparkSubmit.scala:334)
at scala.Option.getOrElse(Option.scala:138)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: linuxkit-025000000001: linuxkit-025000000001: Name or service not known
at java.net.InetAddress.getLocalHost(InetAddress.java:1506)
at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:946)
at org.apache.spark.util.Utils$.localIpAddress$lzycompute(Utils.scala:939)
at org.apache.spark.util.Utils$.localIpAddress(Utils.scala:939)
at org.apache.spark.util.Utils$.$anonfun$localCanonicalHostName$1(Utils.scala:996)
at scala.Option.getOrElse(Option.scala:138)
at org.apache.spark.util.Utils$.localCanonicalHostName(Utils.scala:996)
at org.apache.spark.internal.config.package$.<init>(package.scala:302)
at org.apache.spark.internal.config.package$.<clinit>(package.scala)
... 16 more
Caused by: java.net.UnknownHostException: linuxkit-025000000001: Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getLocalHost(InetAddress.java:1501)
... 24 more
conn_info_file: /tmp/tmpiuwhok7q/tmplief2cba
Traceback (most recent call last):
File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/shell.py", line 38, in <module>
SparkContext._ensure_initialized()
File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/context.py", line 316, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/java_gateway.py", line 46, in launch_gateway
return _launch_gateway(conf)
File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/java_gateway.py", line 109, in _launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
>>>
rmarkbio@linuxkit-025000000001:~/project/spark-2.4.2-bin-hadoop2.7$ ./bin/pyspark
Python 3.5.3+ (default, Nov 29 2017, 08:55:08)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
19/05/07 18:49:58 WARN Utils: Your hostname, linuxkit-025000000001 resolves to a loopback address: 127.0.0.1; using 192.168.65.3 instead (on interface eth0)
19/05/07 18:49:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/05/07 18:49:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.2
/_/
Using Python version 3.5.3+ (default, Nov 29 2017 08:55:08)
SparkSession available as 'spark'.
图像是由以下内容构建的
docker build -t username/image_name:v000 .
docker run -i -t \
--entrypoint /bin/bash \
--net="host" \
--name=container_name \
-v $(PWD):/home/username/project \
-v $(PWD)/../logs:/home/username/logs \
-v ~/.ssh/id_rsa:/root/.ssh/id_rsa \
username/image_name:v000
容器里有类似的东西
docker build -t username/image_name:v000 .
docker run -i -t \
--entrypoint /bin/bash \
--net="host" \
--name=container_name \
-v $(PWD):/home/username/project \
-v $(PWD)/../logs:/home/username/logs \
-v ~/.ssh/id_rsa:/root/.ssh/id_rsa \
username/image_name:v000
我觉得我已经仔细检查了Java、scala、Spark的每个版本以及它们各自的环境变量,但我无法改变这个错误。网上只有少数人提到这个错误,但没有一个是有用的。考虑到很少有人提到这个错误,但是,我认为我缺少了一些简单而明显的东西,因为很多人使用这种技术。我找到了问题的根源,它与Docker以及它在容器中如何处理主机名有关。我还有一个令人不满意的变通办法 假设我已将火花焦油球装上
wget
并将其拆下
进入它的目录后,我可以尝试已经提供的另一个示例脚本
./bin/run-example SparkPi 10
我一直遇到这个错误:
rmarkbio@linuxkit-025000000001:~/project/spark-2.4.2-bin-hadoop2.7$ ./bin/run-example SparkPi 10
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.spark.SparkConf$.<init>(SparkConf.scala:716)
at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala)
at org.apache.spark.SparkConf.$anonfun$getOption$1(SparkConf.scala:389)
at scala.Option.orElse(Option.scala:306)
at org.apache.spark.SparkConf.getOption(SparkConf.scala:389)
at org.apache.spark.SparkConf.get(SparkConf.scala:251)
at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:463)
at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$3(SparkSubmit.scala:334)
at scala.Option.getOrElse(Option.scala:138)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: linuxkit-025000000001: linuxkit-025000000001: Name or service not known
at java.net.InetAddress.getLocalHost(InetAddress.java:1506)
at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:946)
at org.apache.spark.util.Utils$.localIpAddress$lzycompute(Utils.scala:939)
at org.apache.spark.util.Utils$.localIpAddress(Utils.scala:939)
at org.apache.spark.util.Utils$.$anonfun$localCanonicalHostName$1(Utils.scala:996)
at scala.Option.getOrElse(Option.scala:138)
at org.apache.spark.util.Utils$.localCanonicalHostName(Utils.scala:996)
at org.apache.spark.internal.config.package$.<init>(package.scala:302)
at org.apache.spark.internal.config.package$.<clinit>(package.scala)
... 16 more
Caused by: java.net.UnknownHostException: linuxkit-025000000001: Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getLocalHost(InetAddress.java:1501)
... 24 more
表示主机名(此处为“linuxkit-025000001
”)不在/etc/hosts
中。事实上,就是这样。我应该去那个档案里换衣服
127.0.0.1 localhost
到
而且它似乎是有效的(据我所知,从大量的输出,我将放弃张贴在这里)
为了再次检查,让我们尝试运行pyspark
可执行文件
rmarkbio@linuxkit-025000000001:~/project$ pyspark
Python 3.5.3+ (default, Nov 29 2017, 08:55:08)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.spark.SparkConf$.<init>(SparkConf.scala:716)
at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala)
at org.apache.spark.SparkConf.$anonfun$getOption$1(SparkConf.scala:389)
at scala.Option.orElse(Option.scala:306)
at org.apache.spark.SparkConf.getOption(SparkConf.scala:389)
at org.apache.spark.SparkConf.get(SparkConf.scala:251)
at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:463)
at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$3(SparkSubmit.scala:334)
at scala.Option.getOrElse(Option.scala:138)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: linuxkit-025000000001: linuxkit-025000000001: Name or service not known
at java.net.InetAddress.getLocalHost(InetAddress.java:1506)
at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:946)
at org.apache.spark.util.Utils$.localIpAddress$lzycompute(Utils.scala:939)
at org.apache.spark.util.Utils$.localIpAddress(Utils.scala:939)
at org.apache.spark.util.Utils$.$anonfun$localCanonicalHostName$1(Utils.scala:996)
at scala.Option.getOrElse(Option.scala:138)
at org.apache.spark.util.Utils$.localCanonicalHostName(Utils.scala:996)
at org.apache.spark.internal.config.package$.<init>(package.scala:302)
at org.apache.spark.internal.config.package$.<clinit>(package.scala)
... 16 more
Caused by: java.net.UnknownHostException: linuxkit-025000000001: Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getLocalHost(InetAddress.java:1501)
... 24 more
conn_info_file: /tmp/tmpiuwhok7q/tmplief2cba
Traceback (most recent call last):
File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/shell.py", line 38, in <module>
SparkContext._ensure_initialized()
File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/context.py", line 316, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/java_gateway.py", line 46, in launch_gateway
return _launch_gateway(conf)
File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/java_gateway.py", line 109, in _launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
>>>
rmarkbio@linuxkit-025000000001:~/project/spark-2.4.2-bin-hadoop2.7$ ./bin/pyspark
Python 3.5.3+ (default, Nov 29 2017, 08:55:08)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
19/05/07 18:49:58 WARN Utils: Your hostname, linuxkit-025000000001 resolves to a loopback address: 127.0.0.1; using 192.168.65.3 instead (on interface eth0)
19/05/07 18:49:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/05/07 18:49:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.2
/_/
Using Python version 3.5.3+ (default, Nov 29 2017 08:55:08)
SparkSession available as 'spark'.
它工作
那么,linuxkit-025000001的历史是什么?这很好地解释了这一点
我希望在从映像创建容器后立即在/etc/hosts
中进行此更改,因此我需要Dockerfile
来完成此操作。不幸的是,这似乎不可能
到目前为止,我无法从Dockerfile
中找到这样做的方法;Docker似乎被设置为在容器制作完成之前阻止访问/etc/hosts
。相反,我必须满足于只调用init.sh
#!/bin/bash
cp /etc/hosts /home/rmarkbio/project/hosts.new
sed -i "s/127.0.0.1 localhost/127.0.0.1 linuxkit-025000000001 localhost/" /home/rmarkbio/project/hosts.new
echo "somepassword" | sudo -S cp -f /home/rmarkbio/project/hosts.new /etc/hosts
rm /home/rmarkbio/project/hosts.new
echo ''
我找到了问题的根源,它与Docker以及它在容器中处理主机名的方式有关。我还有一个令人不满意的变通办法 假设我已将火花焦油球装上
wget
并将其拆下
进入它的目录后,我可以尝试已经提供的另一个示例脚本
./bin/run-example SparkPi 10
我一直遇到这个错误:
rmarkbio@linuxkit-025000000001:~/project/spark-2.4.2-bin-hadoop2.7$ ./bin/run-example SparkPi 10
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.spark.SparkConf$.<init>(SparkConf.scala:716)
at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala)
at org.apache.spark.SparkConf.$anonfun$getOption$1(SparkConf.scala:389)
at scala.Option.orElse(Option.scala:306)
at org.apache.spark.SparkConf.getOption(SparkConf.scala:389)
at org.apache.spark.SparkConf.get(SparkConf.scala:251)
at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:463)
at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$3(SparkSubmit.scala:334)
at scala.Option.getOrElse(Option.scala:138)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: linuxkit-025000000001: linuxkit-025000000001: Name or service not known
at java.net.InetAddress.getLocalHost(InetAddress.java:1506)
at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:946)
at org.apache.spark.util.Utils$.localIpAddress$lzycompute(Utils.scala:939)
at org.apache.spark.util.Utils$.localIpAddress(Utils.scala:939)
at org.apache.spark.util.Utils$.$anonfun$localCanonicalHostName$1(Utils.scala:996)
at scala.Option.getOrElse(Option.scala:138)
at org.apache.spark.util.Utils$.localCanonicalHostName(Utils.scala:996)
at org.apache.spark.internal.config.package$.<init>(package.scala:302)
at org.apache.spark.internal.config.package$.<clinit>(package.scala)
... 16 more
Caused by: java.net.UnknownHostException: linuxkit-025000000001: Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getLocalHost(InetAddress.java:1501)
... 24 more
表示主机名(此处为“linuxkit-025000001
”)不在/etc/hosts
中。事实上,就是这样。我应该去那个档案里换衣服
127.0.0.1 localhost
到
而且它似乎是有效的(据我所知,从大量的输出,我将放弃张贴在这里)
为了再次检查,让我们尝试运行pyspark
可执行文件
rmarkbio@linuxkit-025000000001:~/project$ pyspark
Python 3.5.3+ (default, Nov 29 2017, 08:55:08)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.spark.SparkConf$.<init>(SparkConf.scala:716)
at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala)
at org.apache.spark.SparkConf.$anonfun$getOption$1(SparkConf.scala:389)
at scala.Option.orElse(Option.scala:306)
at org.apache.spark.SparkConf.getOption(SparkConf.scala:389)
at org.apache.spark.SparkConf.get(SparkConf.scala:251)
at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:463)
at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$3(SparkSubmit.scala:334)
at scala.Option.getOrElse(Option.scala:138)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: linuxkit-025000000001: linuxkit-025000000001: Name or service not known
at java.net.InetAddress.getLocalHost(InetAddress.java:1506)
at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:946)
at org.apache.spark.util.Utils$.localIpAddress$lzycompute(Utils.scala:939)
at org.apache.spark.util.Utils$.localIpAddress(Utils.scala:939)
at org.apache.spark.util.Utils$.$anonfun$localCanonicalHostName$1(Utils.scala:996)
at scala.Option.getOrElse(Option.scala:138)
at org.apache.spark.util.Utils$.localCanonicalHostName(Utils.scala:996)
at org.apache.spark.internal.config.package$.<init>(package.scala:302)
at org.apache.spark.internal.config.package$.<clinit>(package.scala)
... 16 more
Caused by: java.net.UnknownHostException: linuxkit-025000000001: Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getLocalHost(InetAddress.java:1501)
... 24 more
conn_info_file: /tmp/tmpiuwhok7q/tmplief2cba
Traceback (most recent call last):
File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/shell.py", line 38, in <module>
SparkContext._ensure_initialized()
File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/context.py", line 316, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/java_gateway.py", line 46, in launch_gateway
return _launch_gateway(conf)
File "/home/rmarkbio/project/spark-2.4.2-bin-hadoop2.7/python/pyspark/java_gateway.py", line 109, in _launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
>>>
rmarkbio@linuxkit-025000000001:~/project/spark-2.4.2-bin-hadoop2.7$ ./bin/pyspark
Python 3.5.3+ (default, Nov 29 2017, 08:55:08)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
19/05/07 18:49:58 WARN Utils: Your hostname, linuxkit-025000000001 resolves to a loopback address: 127.0.0.1; using 192.168.65.3 instead (on interface eth0)
19/05/07 18:49:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/05/07 18:49:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.2
/_/
Using Python version 3.5.3+ (default, Nov 29 2017 08:55:08)
SparkSession available as 'spark'.
它工作
那么,linuxkit-025000001的历史是什么?这很好地解释了这一点
我希望在从映像创建容器后立即在/etc/hosts
中进行此更改,因此我需要Dockerfile
来完成此操作。不幸的是,这似乎不可能
到目前为止,我无法从Dockerfile
中找到这样做的方法;Docker似乎被设置为在容器制作完成之前阻止访问/etc/hosts
。相反,我必须满足于只调用init.sh
#!/bin/bash
cp /etc/hosts /home/rmarkbio/project/hosts.new
sed -i "s/127.0.0.1 localhost/127.0.0.1 linuxkit-025000000001 localhost/" /home/rmarkbio/project/hosts.new
echo "somepassword" | sudo -S cp -f /home/rmarkbio/project/hosts.new /etc/hosts
rm /home/rmarkbio/project/hosts.new
echo ''