Prometheus AlertManager中的群集侦听地址与播发地址

Prometheus AlertManager中的群集侦听地址与播发地址,prometheus,prometheus-alertmanager,Prometheus,Prometheus Alertmanager,我正在尝试在HA模式下设置alertmanager。我使用docker compose来加速我的警报管理器。以下是配置的2个实例: alertmanager: image: prom/alertmanager restart: always logging: # limit logs retained on host to 25MB driver: "json-file" options: max-size: "500k" max-file: "50" volumes

我正在尝试在HA模式下设置alertmanager。我使用docker compose来加速我的警报管理器。以下是配置的2个实例:

alertmanager:
image: prom/alertmanager
restart: always
logging:
  # limit logs retained on host to 25MB
  driver: "json-file"
  options:
    max-size: "500k"
    max-file: "50"
volumes:
  - ./config:/prometheus
  - /var/lib/grafana/alertmanager:/data
command:
  - '--config.file=/prometheus/alertmanager.yml'
  - '--storage.path=/data'
  - '--cluster.listen-address=localhost:9093'
  - '--cluster.peer=1xx.xx.xx.136:9093'
ports:
  - 9093:9093
每个人都抱怨加入其他人时出现以下错误(这仅来自1个警报管理器):

我检查了9093是否只属于该主机上的警报管理器,其他任何端口也没有使用该端口。此外,端口9093上的主机之间存在连接,因为telnet工作正常

如果删除listen或advertise参数,则会出现以下错误:

level=info ts=2019-06-28T16:57:54.175757472Z caller=main.go:141 build_context="(go=go1.12.4, user=root@932a86a52b76, date=20190503-09:10:07)"
level=info ts=2019-06-28T16:57:54.1764299Z caller=cluster.go:161 component=cluster msg="setting advertise address explicitly" addr=172.19.0.5 port=9094
level=warn ts=2019-06-28T16:57:54.18422936Z caller=cluster.go:226 component=cluster msg="failed to join cluster" err="1 error occurred:\n\t* Failed to join 1xx.xx.xx.136: received invalid msgType (72), expected pushPullMsg (6) from=1xx.xx.xx.136:9093\n\n"
level=info ts=2019-06-28T16:57:54.184265727Z caller=cluster.go:228 component=cluster msg="will retry joining cluster every 10s"
level=warn ts=2019-06-28T16:57:54.184284236Z caller=main.go:230 msg="unable to join gossip mesh" err="1 error occurred:\n\t* Failed to join 1xx.xx.xx.136: received invalid msgType (72), expected pushPullMsg (6) from=172.17.21.137:9093\n\n"
level=info ts=2019-06-28T16:57:54.191170679Z caller=cluster.go:613 component=cluster msg="Waiting for gossip to settle..." interval=2s
level=info ts=2019-06-28T16:57:54.222369961Z caller=coordinator.go:119 component=configuration msg="Loading configuration file" file=/prometheus/alertmanager.yml
level=info ts=2019-06-28T16:57:54.222773958Z caller=coordinator.go:131 component=configuration msg="Completed loading of configuration file" file=/prometheus/alertmanager.yml
level=info ts=2019-06-28T16:57:54.225423449Z caller=main.go:365 msg=Listening address=:9093
level=info ts=2019-06-28T16:57:56.191493442Z caller=cluster.go:638 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000213756s
level=info ts=2019-06-28T16:58:04.193151572Z caller=cluster.go:630 component=cluster msg="gossip settled; proceeding" elapsed=10.001876299s
level=warn ts=2019-06-28T16:58:09.1931086Z caller=cluster.go:428 component=cluster msg=refresh result=failure addr=1xx.xx.xx.136:9093

有人能确认我是否错误地使用了侦听和播发地址参数吗?

指定localhost的侦听地址意味着进程只侦听环回控制器上的连接
您需要指定一个可供其他节点使用的地址或类似于默认的0.0.0.0:。。。在所有可用控制器上侦听的
有关详细信息,请参阅

level=warn ts=2019-06-28T16:38:58.104296695Z caller=cluster.go:154 component=cluster err="couldn't deduce an advertise address: failed to parse bind addr 'localhost'"
level=warn ts=2019-06-28T16:39:08.107555731Z caller=cluster.go:226 component=cluster msg="failed to join cluster" err="1 error occurred:\n\t* Failed to join 1xx.xx.xx.136: read tcp 1xx.19.0.5:41214->1xx.xx.xx.136: i/o timeout\n\n"
level=info ts=2019-06-28T16:39:08.107599804Z caller=cluster.go:228 component=cluster msg="will retry joining cluster every 10s"
level=warn ts=2019-06-28T16:39:08.107631853Z caller=main.go:230 msg="unable to join gossip mesh" err="1 error occurred:\n\t* Failed to join 1xx.xx.xx.136: read tcp 1xx.19.0.5:41214->1xx.xx.xx.136:9093: i/o timeout\n\n"
level=info ts=2019-06-28T16:39:08.107693688Z caller=cluster.go:613 component=cluster msg="Waiting for gossip to settle..." interval=2s
level=info ts=2019-06-28T16:39:08.140619467Z caller=coordinator.go:119 component=configuration msg="Loading configuration file" file=/prometheus/alertmanager.yml
level=info ts=2019-06-28T16:39:08.141617461Z caller=coordinator.go:131 component=configuration msg="Completed loading of configuration file" file=/prometheus/alertmanager.yml
level=info ts=2019-06-28T16:39:08.145128833Z caller=main.go:365 msg=Listening address=:9093
level=error ts=2019-06-28T16:39:08.145275648Z caller=main.go:367 msg="Listen error" err="listen tcp :9093: bind: address already in use"
level=info ts=2019-06-28T16:57:54.175757472Z caller=main.go:141 build_context="(go=go1.12.4, user=root@932a86a52b76, date=20190503-09:10:07)"
level=info ts=2019-06-28T16:57:54.1764299Z caller=cluster.go:161 component=cluster msg="setting advertise address explicitly" addr=172.19.0.5 port=9094
level=warn ts=2019-06-28T16:57:54.18422936Z caller=cluster.go:226 component=cluster msg="failed to join cluster" err="1 error occurred:\n\t* Failed to join 1xx.xx.xx.136: received invalid msgType (72), expected pushPullMsg (6) from=1xx.xx.xx.136:9093\n\n"
level=info ts=2019-06-28T16:57:54.184265727Z caller=cluster.go:228 component=cluster msg="will retry joining cluster every 10s"
level=warn ts=2019-06-28T16:57:54.184284236Z caller=main.go:230 msg="unable to join gossip mesh" err="1 error occurred:\n\t* Failed to join 1xx.xx.xx.136: received invalid msgType (72), expected pushPullMsg (6) from=172.17.21.137:9093\n\n"
level=info ts=2019-06-28T16:57:54.191170679Z caller=cluster.go:613 component=cluster msg="Waiting for gossip to settle..." interval=2s
level=info ts=2019-06-28T16:57:54.222369961Z caller=coordinator.go:119 component=configuration msg="Loading configuration file" file=/prometheus/alertmanager.yml
level=info ts=2019-06-28T16:57:54.222773958Z caller=coordinator.go:131 component=configuration msg="Completed loading of configuration file" file=/prometheus/alertmanager.yml
level=info ts=2019-06-28T16:57:54.225423449Z caller=main.go:365 msg=Listening address=:9093
level=info ts=2019-06-28T16:57:56.191493442Z caller=cluster.go:638 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000213756s
level=info ts=2019-06-28T16:58:04.193151572Z caller=cluster.go:630 component=cluster msg="gossip settled; proceeding" elapsed=10.001876299s
level=warn ts=2019-06-28T16:58:09.1931086Z caller=cluster.go:428 component=cluster msg=refresh result=failure addr=1xx.xx.xx.136:9093