Apache flink Flink TaskManager无法连接到docker swarm任务中的JobManager

Apache flink Flink TaskManager无法连接到docker swarm任务中的JobManager,apache-flink,docker-swarm,akka-cluster,docker-swarm-mode,Apache Flink,Docker Swarm,Akka Cluster,Docker Swarm Mode,我无法让TaskManager在docker swarm堆栈上与JobManager通信 我用来部署docker stack的stack.yml文件的内容是: version: "3" services: jobmanager: image: affo/flink:1.1.3 ports: - "48081:8081" command: jobmanager networks: - my-net deploy:

我无法让TaskManager在docker swarm堆栈上与JobManager通信

我用来部署docker stack的
stack.yml
文件的内容是:

version: "3"
services:
  jobmanager:
    image: affo/flink:1.1.3
    ports:
      - "48081:8081"
    command: jobmanager
    networks:
      - my-net
    deploy:
        mode: replicated
        replicas: 1
        restart_policy:
            condition: none
        placement:
            constraints:
                - node.role == manager

  taskmanager:
    image: affo/flink:1.1.3
    depends_on:
      - jobmanager
    command: taskmanager
    networks:
      - my-net
    deploy:
        mode: replicated
        replicas: 4
        restart_policy:
            condition: none
        placement:
            constraints:
                - node.role != manager

networks:
    my-net:
        external: true
Docker image
affo/flink:1.1.3
是根据自述文件构建的图像推送dockerhub

网络
mynet
是一个可覆盖连接的网络

我尝试使用DNS解析从其他容器ping每个容器,并且一切正常

但是,任何TaskManager都无法将其传递给JobManager

我报告JobManager日志:

以及一个TaskManager的日志:

JM拥有VIP 10.0.42.7。而
jobmanager.rpc.address
设置为
jobmanager
,解析为10.0.42.7

任何关于从哪里开始解决问题的帮助或提示都将不胜感激

非常感谢

更新

我添加了docker exec netstat-tulpn的输出:

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 127.0.0.11:40762        0.0.0.0:*               LISTEN      -
tcp        0      0 ::ffff:10.0.42.7:6123   :::*                    LISTEN      218/java
tcp        0      0 :::8081                 :::*                    LISTEN      218/java
tcp        0      0 :::34963                :::*                    LISTEN      218/java
udp        0      0 127.0.0.11:57000        0.0.0.0:*                           -
docker exec

再次感谢

更新

我最近设法仅在jobmanager处将
jobmanager.rpc.address
更改为
0.0.0
,现在它正在有效地侦听:

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 127.0.0.11:56218        0.0.0.0:*               LISTEN      -
tcp        0      0 :::6123                 :::*                    LISTEN      218/java
tcp        0      0 :::8081                 :::*                    LISTEN      218/java
tcp        0      0 :::55231                :::*                    LISTEN      218/java
udp        0      0 127.0.0.11:47549        0.0.0.0:*                           -
我甚至可以从TaskManager中
nc
telnet

但是,现在的问题是(在JobManager上):

任何帮助都将不胜感激,谢谢

更新


我想我把问题孤立出来了。在github上打开的问题:

如果您关注在github上打开的问题,您可以理解真正的问题是swarm本机网络VIP分配。我把它关掉了,现在一切都正常了

实际上,到目前为止,还没有办法将其从compose file关闭,因此,我不得不切换到脚本部署,而不是自动
docker堆栈部署

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 127.0.0.11:56218        0.0.0.0:*               LISTEN      -
tcp        0      0 :::6123                 :::*                    LISTEN      218/java
tcp        0      0 :::8081                 :::*                    LISTEN      218/java
tcp        0      0 :::55231                :::*                    LISTEN      218/java
udp        0      0 127.0.0.11:47549        0.0.0.0:*                           -
2017-02-09 10:31:20,794 ERROR akka.remote.EndpointWriter                
- dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient
[Actor[akka.tcp://flink@10.0.42.7:6123/]] arriving at [akka.tcp://flink@10.0.42.7:6123]
inbound addresses are [akka.tcp://flink@0.0.0.0:6123]