具有持久性启用同步问题的Redis HA helm图表
目前,我们正在运行支持持久性的redis ha,其上有3个副本(v4.4.4) kubernetes(RKE-on prem)具有longhorn存储类,由于某些未知原因,主设备和从设备无法同步。它可能发生在重启后30分钟或重启后1天,最终我们得到以下错误 从机1错误具有持久性启用同步问题的Redis HA helm图表,redis,redis-ha,Redis,Redis Ha,目前,我们正在运行支持持久性的redis ha,其上有3个副本(v4.4.4) kubernetes(RKE-on prem)具有longhorn存储类,由于某些未知原因,主设备和从设备无法同步。它可能发生在重启后30分钟或重启后1天,最终我们得到以下错误 从机1错误 redis-cli role 1) "slave" 2) "10.43.6.52" 3) (integer) 6379 4) "connect" 5) (integer)
redis-cli role
1) "slave"
2) "10.43.6.52"
3) (integer) 6379
4) "connect"
5) (integer) -1
主错误
redis-cli role
1) "master"
2) (integer) 809012256
3) 1) 1) "10.43.254.123"
2) "6379"
3) "809011980"
2) 1) "10.43.229.244"
2) "6379"
3) "0"
3944:C 22 Sep 2020 09:57:16.102 * RDB: 0 MB of memory used by copy-on-write
1:M 22 Sep 2020 09:57:16.176 * Background saving terminated with success
1:M 22 Sep 2020 09:57:16.176 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 22 Sep 2020 09:57:16.177 * Background RDB transfer started by pid 3945
1:M 22 Sep 2020 09:57:21.283 # Connection with replica 10.43.229.244:6379 lost.
1:M 22 Sep 2020 09:57:21.286 # Background transfer error
1:M 22 Sep 2020 09:57:21.601 * Replica 10.43.229.244:6379 asks for synchronization
1:M 22 Sep 2020 09:57:21.601 * Full resync requested by replica 10.43.229.244:6379
1:M 22 Sep 2020 09:57:21.601 * Delay next BGSAVE for diskless SYNC
1:M 22 Sep 2020 09:57:27.241 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 22 Sep 2020 09:57:27.243 * Background RDB transfer started by pid 3946
1:M 22 Sep 2020 09:57:32.254 # Connection with replica 10.43.229.244:6379 lost.
1:M 22 Sep 2020 09:57:32.266 # Background transfer error
1:M 22 Sep 2020 09:57:32.563 * Replica 10.43.229.244:6379 asks for synchronization
1:M 22 Sep 2020 09:57:32.563 * Full resync requested by replica 10.43.229.244:6379
1:M 22 Sep 2020 09:57:32.563 * Delay next BGSAVE for diskless SYNC
1:M 22 Sep 2020 09:57:38.304 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 22 Sep 2020 09:57:38.304 * Background RDB transfer started by pid 3947
1:M 22 Sep 2020 09:57:43.315 # Connection with replica 10.43.229.244:6379 lost.
1:M 22 Sep 2020 09:57:43.476 # Background transfer error
1:M 22 Sep 2020 09:57:43.517 * Replica 10.43.229.244:6379 asks for synchronization
1:M 22 Sep 2020 09:57:43.517 * Full resync requested by replica 10.43.229.244:6379
1:M 22 Sep 2020 09:57:43.517 * Delay next BGSAVE for diskless SYNC
1:M 22 Sep 2020 09:57:47.098 * 1 changes in 30 seconds. Saving...
1:M 22 Sep 2020 09:57:47.098 * Background saving started by pid 3948
3948:C 22 Sep 2020 09:57:47.124 * DB saved on disk
3948:C 22 Sep 2020 09:57:47.124 * RDB: 0 MB of memory used by copy-on-write
1:M 22 Sep 2020 09:57:47.199 * Background saving terminated with success
1:M 22 Sep 2020 09:57:47.199 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 22 Sep 2020 09:57:47.199 * Background RDB transfer started by pid 3949
3949:C 22 Sep 2020 09:57:47.255 * RDB: 1 MB of memory used by copy-on-write
1:M 22 Sep 2020 09:57:47.299 * Background RDB transfer terminated with success
1:M 22 Sep 2020 09:57:47.299 # Slave 10.43.229.244:6379 correctly received the streamed RDB file.
1:M 22 Sep 2020 09:57:47.299 * Streamed RDB transfer with replica 10.43.229.244:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
1:M 22 Sep 2020 09:57:52.214 # Connection with replica 10.43.229.244:6379 lost.
1:M 22 Sep 2020 09:57:52.517 * Replica 10.43.229.244:6379 asks for synchronization
1:M 22 Sep 2020 09:57:52.517 * Full resync requested by replica 10.43.229.244:6379
1:M 22 Sep 2020 09:57:52.518 * Delay next BGSAVE for diskless SYNC
1:M 22 Sep 2020 09:57:58.355 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 22 Sep 2020 09:57:58.357 * Background RDB transfer started by pid 3950
1:M 22 Sep 2020 09:58:03.422 # Connection with replica 10.43.229.244:6379 lost.
Redis形态:
dir "/data"
port 6379
maxmemory 0
maxmemory-policy volatile-lru
min-replicas-max-lag 5
min-replicas-to-write 1
rdbchecksum yes
rdbcompression yes
repl-diskless-sync yes
save 30 1
timeout 1000
slaveof 10.43.254.123 6379
slave-announce-ip 10.43.6.52
slave-announce-port 6379
到目前为止,我的想法是:
- 密钥来自RabbitMQ,有时开发人员关闭消费者到堆栈消息,堆栈消息可以在redis上产生很大的负载,没有找到任何日志
- Longhorn存储类可能已损坏,未找到任何日志
我愿意接受任何建议。关于这一点的更新,这是由于Longhorn SC迁移到SSD,所有工作正常
6049:C 22 Sep 2020 09:55:55.091 # Failed opening the RDB file dump.rdb (in server root dir /data) for saving: I/O error
1:S 22 Sep 2020 09:55:55.188 # Background saving error
1:S 22 Sep 2020 09:56:01.002 * 1 changes in 30 seconds. Saving...
1:S 22 Sep 2020 09:56:01.002 * Background saving started by pid 6050
6050:C 22 Sep 2020 09:56:01.004 # Failed opening the RDB file dump.rdb (in server root dir /data) for saving: I/O error
1:S 22 Sep 2020 09:56:01.102 # Background saving error
1:S 22 Sep 2020 09:56:07.013 * 1 changes in 30 seconds. Saving...
1:S 22 Sep 2020 09:56:07.014 * Background saving started by pid 6051
6051:C 22 Sep 2020 09:56:07.016 # Failed opening the RDB file dump.rdb (in server root dir /data) for saving: I/O error
redis-cli role
1) "master"
2) (integer) 809012256
3) 1) 1) "10.43.254.123"
2) "6379"
3) "809011980"
2) 1) "10.43.229.244"
2) "6379"
3) "0"
3944:C 22 Sep 2020 09:57:16.102 * RDB: 0 MB of memory used by copy-on-write
1:M 22 Sep 2020 09:57:16.176 * Background saving terminated with success
1:M 22 Sep 2020 09:57:16.176 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 22 Sep 2020 09:57:16.177 * Background RDB transfer started by pid 3945
1:M 22 Sep 2020 09:57:21.283 # Connection with replica 10.43.229.244:6379 lost.
1:M 22 Sep 2020 09:57:21.286 # Background transfer error
1:M 22 Sep 2020 09:57:21.601 * Replica 10.43.229.244:6379 asks for synchronization
1:M 22 Sep 2020 09:57:21.601 * Full resync requested by replica 10.43.229.244:6379
1:M 22 Sep 2020 09:57:21.601 * Delay next BGSAVE for diskless SYNC
1:M 22 Sep 2020 09:57:27.241 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 22 Sep 2020 09:57:27.243 * Background RDB transfer started by pid 3946
1:M 22 Sep 2020 09:57:32.254 # Connection with replica 10.43.229.244:6379 lost.
1:M 22 Sep 2020 09:57:32.266 # Background transfer error
1:M 22 Sep 2020 09:57:32.563 * Replica 10.43.229.244:6379 asks for synchronization
1:M 22 Sep 2020 09:57:32.563 * Full resync requested by replica 10.43.229.244:6379
1:M 22 Sep 2020 09:57:32.563 * Delay next BGSAVE for diskless SYNC
1:M 22 Sep 2020 09:57:38.304 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 22 Sep 2020 09:57:38.304 * Background RDB transfer started by pid 3947
1:M 22 Sep 2020 09:57:43.315 # Connection with replica 10.43.229.244:6379 lost.
1:M 22 Sep 2020 09:57:43.476 # Background transfer error
1:M 22 Sep 2020 09:57:43.517 * Replica 10.43.229.244:6379 asks for synchronization
1:M 22 Sep 2020 09:57:43.517 * Full resync requested by replica 10.43.229.244:6379
1:M 22 Sep 2020 09:57:43.517 * Delay next BGSAVE for diskless SYNC
1:M 22 Sep 2020 09:57:47.098 * 1 changes in 30 seconds. Saving...
1:M 22 Sep 2020 09:57:47.098 * Background saving started by pid 3948
3948:C 22 Sep 2020 09:57:47.124 * DB saved on disk
3948:C 22 Sep 2020 09:57:47.124 * RDB: 0 MB of memory used by copy-on-write
1:M 22 Sep 2020 09:57:47.199 * Background saving terminated with success
1:M 22 Sep 2020 09:57:47.199 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 22 Sep 2020 09:57:47.199 * Background RDB transfer started by pid 3949
3949:C 22 Sep 2020 09:57:47.255 * RDB: 1 MB of memory used by copy-on-write
1:M 22 Sep 2020 09:57:47.299 * Background RDB transfer terminated with success
1:M 22 Sep 2020 09:57:47.299 # Slave 10.43.229.244:6379 correctly received the streamed RDB file.
1:M 22 Sep 2020 09:57:47.299 * Streamed RDB transfer with replica 10.43.229.244:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
1:M 22 Sep 2020 09:57:52.214 # Connection with replica 10.43.229.244:6379 lost.
1:M 22 Sep 2020 09:57:52.517 * Replica 10.43.229.244:6379 asks for synchronization
1:M 22 Sep 2020 09:57:52.517 * Full resync requested by replica 10.43.229.244:6379
1:M 22 Sep 2020 09:57:52.518 * Delay next BGSAVE for diskless SYNC
1:M 22 Sep 2020 09:57:58.355 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 22 Sep 2020 09:57:58.357 * Background RDB transfer started by pid 3950
1:M 22 Sep 2020 09:58:03.422 # Connection with replica 10.43.229.244:6379 lost.
dir "/data"
port 6379
maxmemory 0
maxmemory-policy volatile-lru
min-replicas-max-lag 5
min-replicas-to-write 1
rdbchecksum yes
rdbcompression yes
repl-diskless-sync yes
save 30 1
timeout 1000
slaveof 10.43.254.123 6379
slave-announce-ip 10.43.6.52
slave-announce-port 6379