kubernetes中的Flink部署无法启动作业

kubernetes中的Flink部署无法启动作业,kubernetes,apache-flink,Kubernetes,Apache Flink,我按照以下指南在kubernetes创建了一个flink群集: 作业管理器正在运行。当作业提交到作业管理器时,它生成了一个任务管理器pod,但任务管理器无法连接到作业管理器 2020-10-29 13:22:51,069 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Connecting to ResourceManager akka.tcp://flink@detection-engine-dev

我按照以下指南在kubernetes创建了一个flink群集:

作业管理器正在运行。当作业提交到作业管理器时,它生成了一个任务管理器pod,但任务管理器无法连接到作业管理器

2020-10-29 13:22:51,069 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Connecting to ResourceManager akka.tcp://flink@detection-engine-dev.team-anti-cheat:6123/user/rpc/resourcemanager_*(00000000000000000000000000000000).
2020-10-29 13:22:51,176 WARN  akka.remote.transport.netty.NettyTransport                   [] - Remote connection to [detection-engine-dev.team-anti-cheat/10.123.155.112:6123] failed with java.io.IOException: Connection reset by peer
2020-10-29 13:22:51,176 WARN  akka.remote.transport.netty.NettyTransport                   [] - Remote connection to [detection-engine-dev.team-anti-cheat/10.123.155.112:6123] failed with org.apache.flink.shaded.akka.org.jboss.netty.handler.codec.frame.TooLongFrameException: Adjusted frame length exceeds 10485760: 352518404 - discarded
2020-10-29 13:22:51,180 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [akka.tcp://flink@detection-engine-dev.team-anti-cheat:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@detection-engine-dev.team-anti-cheat:6123]] Caused by: [The remote system explicitly disassociated (reason unknown).]
2020-10-29 13:22:51,183 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://flink@detection-engine-dev.team-anti-cheat:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@detection-engine-dev.team-anti-cheat:6123/user/rpc/resourcemanager_*.
2020-10-29 13:23:01,203 WARN  akka.remote.transport.netty.NettyTransport                   [] - Remote connection to [detection-engine-dev.team-anti-cheat/10.123.155.112:6123] failed with java.io.IOException: Connection reset by peer

如果您的问题是TM找不到JM,则可能必须使用Statefulset类型的pod。因此,您可以配置它们的静态主机名。下面是我的例子

导致此错误的原因:“org.apache.flink.shaded.akka.org.jboss.netty.handler.codec.frame.TooLongFrameException:调整后的帧长度超过10485760:352518404-丢弃”
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: flink-taskmanager
  namespace: kafka
spec:
  replicas: 3
  serviceName: flink-taskmanager
  selector:
    matchLabels:
      app: flink-taskmanager # has to match .spec.template.metadata.labels
  template:
    metadata:
      labels:
        app: flink-taskmanager # has to match .spec.selector.matchLabels
    spec:
      hostname: flink-taskmanager
      volumes:
      - name: flink-config-volume
        configMap:
          name: flink-config
          items:
          - key: flink-conf.yaml
            path: flink-conf.yaml
          - key: log4j-console.properties
            path: log4j-console.properties
      - name: tpch-dbgen-data
        persistentVolumeClaim:
          claimName: tpch-dbgen-data-pvc
      - name: tpch-dbgen-datarate
        persistentVolumeClaim:
          claimName: tpch-dbgen-datarate-pvc
      containers:
      - name: taskmanager
        image: felipeogutierrez/explore-flink:1.11.2-scala_2.12
        imagePullPolicy: Always # Always/IfNotPresent
        env:
        args: ["taskmanager"]
        ports:
        - containerPort: 6122
          name: rpc
        - containerPort: 6125
          name: query-state
        - containerPort: 9250
        livenessProbe:
          tcpSocket:
            port: 6122
          initialDelaySeconds: 30
          periodSeconds: 60
        volumeMounts:
        - name: flink-config-volume
          mountPath: /opt/flink/conf/
        - name: tpch-dbgen-data
          mountPath: /opt/tpch-dbgen/data
          subPath: data
        - mountPath: /tmp
          name: tpch-dbgen-datarate
          subPath: tmp
        securityContext:
          runAsUser: 9999  #