Apache spark 如何使用AWS ECS Fargate而不是EMR/EC2运行apache spark作业?

Apache spark 如何使用AWS ECS Fargate而不是EMR/EC2运行apache spark作业?,apache-spark,amazon-ecs,aws-fargate,Apache Spark,Amazon Ecs,Aws Fargate,AWS Elastic MapReduce有很多功能,但它有一些粗糙的边缘,我想避开它,在ApacheSpark中进行一些相当便宜的计算。具体来说,我想看看是否可以在AWS ECS/Fargate上运行(scala)spark应用程序。如果我能让它在客户端/本地模式下只运行一个容器,那就更好了 我首先选择hadoop3(用于AWS STS支持)和kubernetes配置文件发布Spark: 标签v2.4.0下的apache/spark git存储库中 ./dev/make-distributio

AWS Elastic MapReduce有很多功能,但它有一些粗糙的边缘,我想避开它,在ApacheSpark中进行一些相当便宜的计算。具体来说,我想看看是否可以在AWS ECS/Fargate上运行(scala)spark应用程序。如果我能让它在客户端/本地模式下只运行一个容器,那就更好了

我首先选择hadoop3(用于AWS STS支持)和kubernetes配置文件发布Spark:

标签v2.4.0下的apache/spark git存储库中 ./dev/make-distribution.sh——名称hadoop3-kubernetes-Phadoop-3.1-Pkubernetes-T4 然后从该发行版中构建通用spark docker映像:

docker build-tspark:2.4.0-hadoop3.1-f kubernetes/dockerfiles/spark/Dockerfile。
然后在我的项目中,我在tope上构建了另一个docker映像,将sbt组装的uberjar复制到工作目录中,并将入口点设置为spark submit shell脚本

# Dockerfile
FROM spark:2.4.0-hadoop3.1
COPY target/scala-2.11/my-spark-assembly.jar .
ENTRYPOINT [ "/opt/spark/bin/spark-submit" ]
在本地计算机上,我可以通过在docker compose命令规范中提供适当的参数来运行该应用程序:

# docker-compose.yml
...
   command:
     - --master
     - local[*]
     - --deploy-mode
     - client
     - my-spark-assembly.jar
不幸的是,在Fargate ECS中,我很快就遇到了一个故障,将以下stacktrace写入CloudWatch:

Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.spark.SparkConf$.<init>(SparkConf.scala:714)
at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala)
at org.apache.spark.SparkConf$$anonfun$getOption$1.apply(SparkConf.scala:388)
at org.apache.spark.SparkConf$$anonfun$getOption$1.apply(SparkConf.scala:388)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.SparkConf.getOption(SparkConf.scala:388)
at org.apache.spark.SparkConf.get(SparkConf.scala:250)
at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:463)
at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
at org.apache.spark.deploy.SparkSubmit$$anonfun$2.apply(SparkSubmit.scala:334)
at org.apache.spark.deploy.SparkSubmit$$anonfun$2.apply(SparkSubmit.scala:334)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: c0d66fa49434: c0d66fa49434: Name does not resolve
at java.net.InetAddress.getLocalHost(InetAddress.java:1506)
at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:946)
at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$localIpAddress$lzycompute(Utils.scala:939)
at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$localIpAddress(Utils.scala:939)
at org.apache.spark.util.Utils$$anonfun$localCanonicalHostName$1.apply(Utils.scala:996)
at org.apache.spark.util.Utils$$anonfun$localCanonicalHostName$1.apply(Utils.scala:996)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.util.Utils$.localCanonicalHostName(Utils.scala:996)
at org.apache.spark.internal.config.package$.<init>(package.scala:296)
at org.apache.spark.internal.config.package$.<clinit>(package.scala)
... 18 more
Caused by: java.net.UnknownHostException: c0d66fa49434: Name does not resolve
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getLocalHost(InetAddress.java:1501)
... 27 more
线程“main”java.lang.ExceptionInInitializeError中的异常 位于org.apache.spark.SparkConf$(SparkConf.scala:714) 位于org.apache.spark.SparkConf$(SparkConf.scala) 位于org.apache.spark.SparkConf$$anonfun$getOption$1.apply(SparkConf.scala:388) 位于org.apache.spark.SparkConf$$anonfun$getOption$1.apply(SparkConf.scala:388) 在scala.Option.orElse(Option.scala:289) 位于org.apache.spark.SparkConf.getOption(SparkConf.scala:388) 位于org.apache.spark.SparkConf.get(SparkConf.scala:250) 在org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appends3和sparkhadoopconfigurations上(SparkHadoopUtil.scala:463) 位于org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436) 位于org.apache.spark.deploy.SparkSubmit$$anonfun$2.apply(SparkSubmit.scala:334) 位于org.apache.spark.deploy.SparkSubmit$$anonfun$2.apply(SparkSubmit.scala:334) 位于scala.Option.getOrElse(Option.scala:121) 位于org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334) 位于org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143) 位于org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) 位于org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924) 位于org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933) 位于org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 原因:java.net.UnknownHostException:c0d66fa49434:c0d66fa49434:名称未解析 位于java.net.InetAddress.getLocalHost(InetAddress.java:1506) 位于org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:946) 位于org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$localIpAddress$lzycompute(Utils.scala:939) 位于org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$localIpAddress(Utils.scala:939) 位于org.apache.spark.util.Utils$$anonfun$localCanonicalHostName$1.apply(Utils.scala:996) 位于org.apache.spark.util.Utils$$anonfun$localCanonicalHostName$1.apply(Utils.scala:996) 位于scala.Option.getOrElse(Option.scala:121) 位于org.apache.spark.util.Utils$.localCanonicalHostName(Utils.scala:996) 位于org.apache.spark.internal.config.package$(package.scala:296) 位于org.apache.spark.internal.config.package$(package.scala) ... 还有18个 原因:java.net.UnknownHostException:c0d66fa49434:名称未解析 位于java.net.Inet4AddressImpl.lookupAllHostAddr(本机方法) 位于java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) 位于java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) 位于java.net.InetAddress.getLocalHost(InetAddress.java:1501) ... 还有27个
有没有人成功地进行了类似的尝试?

Fargate支持任务定义中提供的“command”或“environment”参数。是的,为了清楚起见,我将相同的命令从docker compose转换到了任务定义中,作为逗号分隔的字符串。Fargate支持任务定义中提供的“command”或“environment”参数。是的,为了清楚起见,我将docker compose中的同一命令作为逗号分隔的字符串转换为任务定义。