Fargate集群上AWS ECS docker容器中运行的java应用程序的未知后异常

Fargate集群上AWS ECS docker容器中运行的java应用程序的未知后异常,java,docker,aws-fargate,Java,Docker,Aws Fargate,我有一个大型Java应用程序,我正试图在AWS的fargate集群上运行。映像在本地计算机的docker上成功运行。当我在fargate中运行它时,它成功启动,但最终遇到以下错误,应用程序在该错误后被卡住: ! java.net.UnknownHostException: 690bd678bcf4: 690bd678bcf4: Name or service not known ! at java.net.InetAddress.getLocalHost(InetAddress.java:150

我有一个大型Java应用程序,我正试图在AWS的fargate集群上运行。映像在本地计算机的docker上成功运行。当我在fargate中运行它时,它成功启动,但最终遇到以下错误,应用程序在该错误后被卡住:

! java.net.UnknownHostException: 690bd678bcf4: 690bd678bcf4: Name or service not known
! at java.net.InetAddress.getLocalHost(InetAddress.java:1505) ~[na:1.8.0_151]
! at tracelink.misc.SingletonTokenDBO$.<init>(SingletonTokenDBO.scala:34) ~[habari.jar:8.4-QUARTZ-SNAPSHOT]
! at tracelink.misc.SingletonTokenDBO$.<clinit>(SingletonTokenDBO.scala) ~[habari.jar:8.4-QUARTZ-SNAPSHOT]
!... 10 common frames omitted
Caused by: ! java.net.UnknownHostException: 690bd678bcf4: Name or service not known
! at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) ~[na:1.8.0_151]
! at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) ~[na:1.8.0_151]
! at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) ~[na:1.8.0_151]
! at java.net.InetAddress.getLocalHost(InetAddress.java:1500) ~[na:1.8.0_151]
!... 12 common frames omitted
一些初步研究表明,错误与容器中/etc/hosts文件的内容有关。因此,我创建了一个小测试程序,该程序表现出与实际应用程序相同的行为,并且还将/etc/hosts的内容转储到stdout:

import java.net.*;
import java.io.*;

public class NetworkTest {
   public static void main(String[] args) throws InterruptedException, IOException, FileNotFoundException {
      while(true) {
         networkDump();
         Thread.sleep(10000);
      }
   }

   private static void networkDump() throws IOException, FileNotFoundException {
      System.out.println("/etc/hosts:");
      System.out.println("");

      FileReader f = new FileReader("/etc/hosts");
      BufferedReader reader = new BufferedReader(f);
      String line = null;
      while((line = reader.readLine()) != null) {
         System.out.println(line);
      }
      System.out.println("");

      dumpHostname();
   }

   private static void dumpHostname() {
      try {
         String hostname = InetAddress.getLocalHost().getHostName();
         System.out.printf("Hostname: %s\n\n", hostname);
      } catch(UnknownHostException e) {
         System.out.println(e.getMessage());
      }
   }
}
Dockerfile:

FROM openjdk:8

WORKDIR /site
ADD . /site

CMD ["java", "NetworkTest"]
我在AWS中从中获得的输出如下所示:

/etc/hosts:
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

3a5a4271a6e3: 3a5a4271a6e3: Name or service not known
与本地计算机上docker中运行的此输出相比:

> docker run networktest

/etc/hosts:
127.0.0.1   localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.4  82691e2fb948

Hostname: 82691e2fb948
未获取异常的本地版本在/etc/hosts中有一个主机名条目,而AWS hosts文件中没有主机名条目。我尝试添加一个/etc/rc.local文件以手动将主机名添加到localhost行的末尾,并在Dockerfile中添加一个RUN命令以执行相同的操作。两者都没有任何效果


有人知道是否有办法配置映像或ECS任务定义,以便在AWS中正确配置主机名吗?

通过以下步骤将主机名指向127.0.0.1:

echo "127.0.0.1 $HOSTNAME" >> /etc/hosts
为我解决了这个问题

我用的是Docker Compose。因此,我有一个
docker compose.yml
文件,如下所示:

version: '2'

services:
  myservice:
    command: ["/set-hostname.sh", "--", "/run-service.sh"]
#!/bin/bash

set -e

shift
cmd="$@"

echo "127.0.0.1 $HOSTNAME" >> /etc/hosts

exec $cmd
然后,
set hostname.sh
文件如下所示:

version: '2'

services:
  myservice:
    command: ["/set-hostname.sh", "--", "/run-service.sh"]
#!/bin/bash

set -e

shift
cmd="$@"

echo "127.0.0.1 $HOSTNAME" >> /etc/hosts

exec $cmd

所以,我遇到了完全相同的问题,问题是,正如你已经提到的,主机名没有多大意义。 获取VPC中可以看到的实际实例IP的唯一方法是使用AWS任务元数据API,在我的例子中,我就是这样做的。

我已连接以下代码以获取本地主机IP:

try {
            final ResponseEntity<String> taskInfoResponse = this.restTemplate.getForEntity("http://169.254.170.2/v2/metadata", String.class);
            log.info("Got AWS task info: {}", taskInfoResponse);
            log.info("Got AWS task info: {}", taskInfoResponse.getBody());
            if (taskInfoResponse.getStatusCode() == HttpStatus.OK) {
                try {
                    final ObjectNode jsonNodes = this.objectMapper.readValue(taskInfoResponse.getBody(), ObjectNode.class);
                    final JsonNode jsonNode = jsonNodes.get("Containers")
                            .get(0).get("Networks")
                            .get(0)
                            .get("IPv4Addresses").get(0);
                    log.info("Got IP to use: {}", jsonNode);
                    if (jsonNode != null) {
                        awsTaskInfo.setTaskAddress(InetAddress.getByName(jsonNode.asText()));
                    }
                } catch (IOException e) {
                    throw new IllegalArgumentException(e);
                }
            } else {
                awsTaskInfo.setTaskAddress(InetAddress.getLoopbackAddress());
            }
        }catch (ResourceAccessException e){
            log.error("Failed to fetch AWS info", e);
            awsTaskInfo.setTaskAddress(InetAddress.getLoopbackAddress());
        }
试试看{
final ResponseEntity taskInfoResponse=this.restTemplate.getForEntity(“http://169.254.170.2/v2/metadata“,String.class);
log.info(“获得AWS任务信息:{}”,taskInfoResponse);
log.info(“获得AWS任务信息:{}”,taskInfoResponse.getBody());
if(taskInfoResponse.getStatusCode()==HttpStatus.OK){
试一试{
final ObjectNode jsonNodes=this.objectMapper.readValue(taskInfoResponse.getBody(),ObjectNode.class);
final JsonNode JsonNode=jsonNodes.get(“容器”)
.get(0).get(“网络”)
.get(0)
.get(“ipv4地址”).get(0);
info(“获得要使用的IP:{}”,jsonNode);
if(jsonNode!=null){
setTaskAddress(InetAddress.getByName(jsonNode.asText());
}
}捕获(IOE异常){
抛出新的IllegalArgumentException(e);
}
}否则{
awsTaskInfo.setTaskAddress(InetAddress.getLoopbackAddress());
}
}捕获(资源访问异常){
日志错误(“获取AWS信息失败”,e);
awsTaskInfo.setTaskAddress(InetAddress.getLoopbackAddress());
}

正是我长期以来一直在努力解决的问题。这个解决方案对我有效:

ENTRYPOINT ["/bin/sh", "-c" , "echo 127.0.0.1 $HOSTNAME >> /etc/hosts && exec mvn spring-boot:run"]

我在尝试从Lambda访问S3和SQS时遇到了同样的问题。解决方案不是在创建客户端实例时指定区域,而是:

SqsAsyncClient.builder()
                .region(Region.of(region))
                .build();
这样做:

SqsAsyncClient.create();

同样的问题。到目前为止,您找到了一些解决方案吗?这里有相同的问题…您是如何运行该命令的?作为docker文件的一部分,在实例启动后登录到该实例,是否还有其他内容?