故障转移后,哨兵无法将初始主服务器提升回主模式

我正在以哨兵模式运行Redis,其中有1个主节点,2个副本节点和3个哨兵节点。我正在docker swarm环境中运行所有节点。 所有节点启动正常。首先,我们为节点提供以下IP

master      10.0.20.2 
replica-1   10.0.20.5
replica-2   10.0.20.10

接下来,我停止主容器来关闭主节点,以便哨兵应选择副本节点之一作为新的主节点。 一切正常,并且选择了replica-1节点作为新的主节点。

与此同时,码头工人为master增添了一个新容器,它作为奴隶加入了redis前哨网络。

接下来,我关闭replica-1节点以进行另一个故障转移。现在,当哨兵试图将master节点从从节点升级到主节点时,就会发生实际问题。

以下是哨兵试图使其成为主节点的master节点redis配置文件。我想知道为什么当此节点是新的主节点且IP属于同一节点时,为什么用replicaof 10.0.20.2 6379更新文件。 主节点redis.conf

root@0fd67f6ceb37:/data# tail -f /etc/redis/redis.conf
replica-announce-ip "redis-master"
#replica-announce-port 6379
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error no
rdbchecksum yes
# Generated by CONFIG REWRITE

replicaof 10.0.20.2 6379

这是错误的配置,因此有时会失败,并且哨兵选择replica-2节点作为新的主节点 这是master节点日志时出现的错误(以下是详细的日志文件) Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master 最后,replica-2充当masterreplica-1并充当两个从属。

主节点日志(这是在主节点作为从节点加入,并且哨兵试图将其提升为主模式之后)

[docker@chopswarm1 redis-failover]$ d logs 0fd67f6ceb37
1:C 05 Nov 2019 06:43:49.360 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 05 Nov 2019 06:43:49.360 # Redis version=5.0.5,bits=64,commit=00000000,modified=0,pid=1,just started
1:C 05 Nov 2019 06:43:49.360 # Configuration loaded
1:M 05 Nov 2019 06:43:49.361 * Running mode=standalone,port=6379.
1:M 05 Nov 2019 06:43:49.361 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 05 Nov 2019 06:43:49.361 # Server initialized
1:M 05 Nov 2019 06:43:49.361 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 05 Nov 2019 06:43:49.361 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root,and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 05 Nov 2019 06:43:49.361 * DB loaded from disk: 0.000 seconds
1:M 05 Nov 2019 06:43:49.361 * Ready to accept connections
1:S 05 Nov 2019 06:43:59.817 * Before turning into a replica,using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 05 Nov 2019 06:43:59.817 * REPLICAOF 10.0.20.5:6379 enabled (user request from 'id=5 addr=10.0.20.7:60534 fd=10 name=sentinel-38a1e461-cmd age=10 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=148 qbuf-free=32620 obl=36 oll=0 omem=0 events=r cmd=exec')
1:S 05 Nov 2019 06:43:59.817 # CONFIG REWRITE executed with success.
1:S 05 Nov 2019 06:44:00.386 * Connecting to MASTER 10.0.20.5:6379
1:S 05 Nov 2019 06:44:00.387 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:44:00.387 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:44:00.387 * Master replied to PING,replication can continue...
1:S 05 Nov 2019 06:44:00.387 * Trying a partial resynchronization (request 0b1ed09c8d497744632c93cab960c4ca4ee9a11e:1).
1:S 05 Nov 2019 06:44:00.388 * Full resync from master: f3c311652d8860c93048eba075521df7033cab2f:38645
1:S 05 Nov 2019 06:44:00.388 * Discarding previously cached master state.
1:S 05 Nov 2019 06:44:00.486 * MASTER <-> REPLICA sync: receiving 178 bytes from master
1:S 05 Nov 2019 06:44:00.486 * MASTER <-> REPLICA sync: Flushing old data
1:S 05 Nov 2019 06:44:00.486 * MASTER <-> REPLICA sync: Loading DB in memory
1:S 05 Nov 2019 06:44:00.486 * MASTER <-> REPLICA sync: Finished with success
1:S 05 Nov 2019 06:44:35.367 # Connection with master lost.
1:S 05 Nov 2019 06:44:35.367 * Caching the disconnected master state.
1:S 05 Nov 2019 06:44:35.464 * Connecting to MASTER 10.0.20.5:6379
1:S 05 Nov 2019 06:44:35.465 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:44:35.465 # Error condition on socket for SYNC: Connection refused
1:S 05 Nov 2019 06:44:36.466 * Connecting to MASTER 10.0.20.5:6379
1:S 05 Nov 2019 06:44:36.466 * MASTER <-> REPLICA sync started
1:M 05 Nov 2019 06:44:40.748 # Setting secondary replication ID to f3c311652d8860c93048eba075521df7033cab2f,valid up to offset: 46004. New replication ID is 77213f07383dd307e4b6d917b6a8789de42cad20
1:M 05 Nov 2019 06:44:40.748 * Discarding previously cached master state.
1:M 05 Nov 2019 06:44:40.748 * MASTER MODE enabled (user request from 'id=16 addr=10.0.20.7:60576 fd=17 name=sentinel-38a1e461-cmd age=31 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=140 qbuf-free=32628 obl=36 oll=0 omem=0 events=r cmd=exec')
1:M 05 Nov 2019 06:44:40.748 # CONFIG REWRITE executed with success.
1:M 05 Nov 2019 06:44:41.881 * Replica redis-replica-2:6379 asks for synchronization
1:M 05 Nov 2019 06:44:41.881 * Partial resynchronization request from redis-replica-2:6379 accepted. Sending 881 bytes of backlog starting from offset 46004.
1:S 05 Nov 2019 06:44:43.132 # Connection with replica redis-replica-2:6379 lost.
1:S 05 Nov 2019 06:44:43.132 * Before turning into a replica,using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 05 Nov 2019 06:44:43.132 * REPLICAOF 10.0.20.2:6379 enabled (user request from 'id=24 addr=10.0.20.7:60636 fd=15 name=sentinel-38a1e461-cmd age=3 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=291 qbuf-free=32477 obl=36 oll=0 omem=0 events=r cmd=exec')
1:S 05 Nov 2019 06:44:43.133 # CONFIG REWRITE executed with success.
1:S 05 Nov 2019 06:44:43.484 * Connecting to MASTER 10.0.20.2:6379
1:S 05 Nov 2019 06:44:43.484 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:44:43.484 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:44:43.484 * Master replied to PING,replication can continue...
1:S 05 Nov 2019 06:44:43.484 * Trying a partial resynchronization (request 77213f07383dd307e4b6d917b6a8789de42cad20:46885).
1:S 05 Nov 2019 06:44:43.484 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 05 Nov 2019 06:44:44.489 * Connecting to MASTER 10.0.20.2:6379
1:S 05 Nov 2019 06:44:44.489 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:44:44.489 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:44:44.489 * Master replied to PING,replication can continue...
1:S 05 Nov 2019 06:44:44.490 * Trying a partial resynchronization (request 77213f07383dd307e4b6d917b6a8789de42cad20:46885).
1:S 05 Nov 2019 06:44:44.490 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 05 Nov 2019 06:44:45.489 * Connecting to MASTER 10.0.20.2:6379
1:S 05 Nov 2019 06:44:45.490 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:44:45.490 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:44:45.490 * Master replied to PING,replication can continue...
1:S 05 Nov 2019 06:44:45.490 * Trying a partial resynchronization (request 77213f07383dd307e4b6d917b6a8789de42cad20:46885).
1:S 05 Nov 2019 06:44:45.490 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 05 Nov 2019 06:44:46.493 * Connecting to MASTER 10.0.20.2:6379
1:S 05 Nov 2019 06:44:46.493 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:44:46.493 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:44:46.493 * Master replied to PING,replication can continue...
1:S 05 Nov 2019 06:44:46.493 * Trying a partial resynchronization (request 77213f07383dd307e4b6d917b6a8789de42cad20:46885).
1:S 05 Nov 2019 06:44:46.494 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 05 Nov 2019 06:44:47.493 * Connecting to MASTER 10.0.20.2:6379
1:S 05 Nov 2019 06:44:47.494 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:44:47.494 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:44:47.494 * Master replied to PING,replication can continue...
1:S 05 Nov 2019 06:44:47.494 * Trying a partial resynchronization (request 77213f07383dd307e4b6d917b6a8789de42cad20:46885).

<-- omitted few entries for the same errors as above for better readability -->

1:S 05 Nov 2019 06:45:21.575 * Connecting to MASTER 10.0.20.2:6379
1:S 05 Nov 2019 06:45:21.575 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:45:21.575 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:45:21.575 * Master replied to PING,replication can continue...
1:S 05 Nov 2019 06:45:21.575 * Trying a partial resynchronization (request 77213f07383dd307e4b6d917b6a8789de42cad20:46885).
1:S 05 Nov 2019 06:45:21.575 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 05 Nov 2019 06:45:22.456 * REPLICAOF 10.0.20.10:6379 enabled (user request from 'id=113 addr=10.0.20.7:60950 fd=12 name=sentinel-38a1e461-cmd age=5 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=150 qbuf-free=32618 obl=36 oll=0 omem=0 events=r cmd=exec')
1:S 05 Nov 2019 06:45:22.456 # CONFIG REWRITE executed with success.
1:S 05 Nov 2019 06:45:22.577 * Connecting to MASTER 10.0.20.10:6379
1:S 05 Nov 2019 06:45:22.577 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:45:22.577 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:45:22.577 * Master replied to PING,replication can continue...
1:S 05 Nov 2019 06:45:22.577 * Trying a partial resynchronization (request 77213f07383dd307e4b6d917b6a8789de42cad20:46885).
1:S 05 Nov 2019 06:45:22.577 * Successful partial resynchronization with master.
1:S 05 Nov 2019 06:45:22.577 # Master replication ID changed to 3235720aad34423d6f82f9db4a953042c1f16d58
1:S 05 Nov 2019 06:45:22.577 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.

前哨日志文件(已在故障转移开始时添加了其他换行符)

root@3708cf05eca4:/data# cat sentinel.log
1:X 05 Nov 2019 06:40:49.116 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 05 Nov 2019 06:40:49.116 # Redis version=5.0.5,just started
1:X 05 Nov 2019 06:40:49.116 # Configuration loaded
1:X 05 Nov 2019 06:40:49.117 * Running mode=sentinel,port=26379.
1:X 05 Nov 2019 06:40:49.117 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:X 05 Nov 2019 06:40:49.119 # Sentinel ID is 38a1e461910e17fb7be79e695040074df2dde2df
1:X 05 Nov 2019 06:40:49.119 # +monitor master eaas-redis-master 10.0.20.2 6379 quorum 2
1:X 05 Nov 2019 06:40:49.120 * +slave slave redis-replica-1:6379 10.0.20.5 6379 @ eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:40:51.183 * +sentinel sentinel 3b0831ce9f6aff70f9bf45f4211d66ebfd1c6a21 10.0.20.33 26379 @ eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:40:59.150 * +slave slave redis-replica-2:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:40:59.202 * +fix-slave-config slave redis-replica-1:6379 10.0.20.5 6379 @ eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:41:01.362 * +sentinel sentinel 464f3750404b419fccf513784f40baf7f6622cba 10.0.20.41 26379 @ eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:41:09.249 * +fix-slave-config slave redis-replica-2:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.2 6379



1:X 05 Nov 2019 06:43:48.513 # +sdown master eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:43:48.594 # +new-epoch 1
1:X 05 Nov 2019 06:43:48.595 # +vote-for-leader 464f3750404b419fccf513784f40baf7f6622cba 1
1:X 05 Nov 2019 06:43:48.613 # +odown master eaas-redis-master 10.0.20.2 6379 #quorum 2/2
1:X 05 Nov 2019 06:43:48.613 # Next failover delay: I will not start a failover before Tue Nov  5 06:43:59 2019
1:X 05 Nov 2019 06:43:49.732 # +config-update-from sentinel 464f3750404b419fccf513784f40baf7f6622cba 10.0.20.41 26379 @ eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:43:49.732 # +switch-master eaas-redis-master 10.0.20.2 6379 10.0.20.5 6379
1:X 05 Nov 2019 06:43:49.732 * +slave slave 10.0.20.10:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:43:49.732 * +slave slave 10.0.20.2:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:43:49.785 * +slave slave redis-replica-2:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:43:59.816 * +convert-to-slave slave 10.0.20.2:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:09.832 * +slave slave redis-master:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379



1:X 05 Nov 2019 06:44:40.453 # +sdown master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:40.524 # +odown master eaas-redis-master 10.0.20.5 6379 #quorum 2/2
1:X 05 Nov 2019 06:44:40.524 # +new-epoch 2
1:X 05 Nov 2019 06:44:40.524 # +try-failover master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:40.525 # +vote-for-leader 38a1e461910e17fb7be79e695040074df2dde2df 2
1:X 05 Nov 2019 06:44:40.525 # 3b0831ce9f6aff70f9bf45f4211d66ebfd1c6a21 voted for 3b0831ce9f6aff70f9bf45f4211d66ebfd1c6a21 2
1:X 05 Nov 2019 06:44:40.528 # 464f3750404b419fccf513784f40baf7f6622cba voted for 38a1e461910e17fb7be79e695040074df2dde2df 2
1:X 05 Nov 2019 06:44:40.580 # +elected-leader master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:40.580 # +failover-state-select-slave master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:40.681 # +selected-slave slave redis-master:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:40.681 * +failover-state-send-slaveof-noone slave redis-master:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:40.748 * +failover-state-wait-promotion slave redis-master:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:41.003 # +promoted-slave slave redis-master:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:41.003 # +failover-state-reconf-slaves master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:41.101 * +slave-reconf-sent slave 10.0.20.10:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:41.598 # -odown master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:42.050 * +slave-reconf-inprog slave 10.0.20.10:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:42.050 * +slave-reconf-done slave 10.0.20.10:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:42.107 * +slave-reconf-sent slave redis-replica-2:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:43.056 * +slave-reconf-inprog slave redis-replica-2:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:43.056 * +slave-reconf-done slave redis-replica-2:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:43.132 * +slave-reconf-sent slave 10.0.20.2:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:44.111 * +slave-reconf-inprog slave 10.0.20.2:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:46.056 # +failover-end-for-timeout master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:46.056 # +failover-end master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:46.056 * +slave-reconf-sent-be slave redis-master:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:46.056 * +slave-reconf-sent-be slave 10.0.20.2:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:46.056 # +switch-master eaas-redis-master 10.0.20.5 6379 10.0.20.2 6379
1:X 05 Nov 2019 06:44:46.057 * +slave slave 10.0.20.10:6379 10.0.20.10 6379 @ eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:44:46.057 * +slave slave 10.0.20.5:6379 10.0.20.5 6379 @ eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:44:51.062 # +sdown slave 10.0.20.5:6379 10.0.20.5 6379 @ eaas-redis-master 10.0.20.2 6379


1:X 05 Nov 2019 06:45:11.226 # +sdown master eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:45:16.233 # +new-epoch 3
1:X 05 Nov 2019 06:45:16.234 # +vote-for-leader 464f3750404b419fccf513784f40baf7f6622cba 3
1:X 05 Nov 2019 06:45:16.535 # +odown master eaas-redis-master 10.0.20.2 6379 #quorum 3/2
1:X 05 Nov 2019 06:45:16.535 # Next failover delay: I will not start a failover before Tue Nov  5 06:45:26 2019
1:X 05 Nov 2019 06:45:17.285 # +config-update-from sentinel 464f3750404b419fccf513784f40baf7f6622cba 10.0.20.41 26379 @ eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:45:17.285 # +switch-master eaas-redis-master 10.0.20.2 6379 10.0.20.10 6379
1:X 05 Nov 2019 06:45:17.285 * +slave slave 10.0.20.5:6379 10.0.20.5 6379 @ eaas-redis-master 10.0.20.10 6379
1:X 05 Nov 2019 06:45:17.285 * +slave slave 10.0.20.2:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.10 6379
1:X 05 Nov 2019 06:45:22.456 * +fix-slave-config slave 10.0.20.2:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.10 6379
1:X 05 Nov 2019 06:45:26.347 * +slave slave redis-replica-1:6379 10.0.20.5 6379 @ eaas-redis-master 10.0.20.10 6379
1:X 05 Nov 2019 06:45:26.348 * +slave slave redis-master:6379 10.0.20.2 6379 @ eaas-redis-master 10.0.20.10 6379
root@3708cf05eca4:/data#

所以我想知道为什么哨兵只用replicaof重写主节点的配置文件(升级到MASTER模式时仅发生在master节点而不是副本节点) 我该如何改善这种情况,以便在故障转移期间哨兵将master节点重新置于主模式下。

请让我知道是否需要更多信息。

以下是启动docker swarm堆栈时主节点和副本节点的redis配置文件。 redis.conf(master)

dir /data/

replica-announce-ip {{REDIS_MASTER}}

save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error no
rdbchecksum yes

redis.conf(副本)

replicaof {{REDIS_MASTER}} 6379
dir /data/

replica-announce-ip {{REDIS_REPLICA}}

save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error no
rdbchecksum yes

weizhaoxia96 回答:故障转移后,哨兵无法将初始主服务器提升回主模式

docker服务与docker容器IP有一些特定的问题:https://github.com/moby/moby/issues/30963

因此,当您设置REDIS_MASTER_HOST:redis-master(例如)环境时,它指向docker服务IP,而不是实际的redis-master容器IP-并且此行为破坏了前哨逻辑。

我在docker-compose(请注意dnsrr endpoint_mode)中使用了以下配置:

redis-master:
 image: bitnami/redis:5.0.9
 environment:
   REDIS_REPLICATION_MODE: master
   ALLOW_EMPTY_PASSWORD: 'yes'
   REDIS_AOF_ENABLED: 'no'
 deploy:
   endpoint_mode: dnsrr

此配置为redis-master DNS服务记录和容器本身提供了一个IP。但是在这种情况下,redis-master容器重启后的IP地址将更改,因此失败后,我将使用以下命令重新创建哨兵配置:

# on redis-master
redis-cli SLAVEOF <new IP master> 6379
# on all sentinel nodes
redis-cli -p 26379 SENTINEL REMOVE mymaster
redis-cli -p 26379 SENTINEL monitor mymaster <new IP redis-master> 6379 2

此操作后,redis-master将正确更改角色。

另一个选择:

我要求功能为docker swarm环境中的Redis获取正确的IP:https://github.com/bitnami/bitnami-docker-redis/issues/174

然后我创建了fork:https://github.com/tartemov/bitnami-docker-redis(dns_lookup函数有一项更改)

在这种情况下,docker-compose文件如下所示:

services:
 redis-master:
  image: ndocker-registry/redis:5.0.9
  hostname: "redis-master"
   environment:
     REDIS_REPLICATION_MODE: master
     ALLOW_EMPTY_PASSWORD: 'yes'
     REDIS_AOF_ENABLED: 'no'
 redis-slave-1:
   image: docker-registry/redis:5.0.9
   environment:
     REDIS_REPLICATION_MODE: slave
     REDIS_MASTER_HOST: redis-master
     ALLOW_EMPTY_PASSWORD: 'yes'
     REDIS_AOF_ENABLED: 'no'
sentinel:
  image: bitnami/redis-sentinel:5.0.9-debian-10-r49
  environment:
    REDIS_MASTER_HOST: redis-master
    ALLOW_EMPTY_PASSWORD: 'yes'
    REDIS_AOF_ENABLED: 'no'
    REDIS_SENTINEL_DOWN_AFTER_MILLISECONDS: 5000
    REDIS_SENTINEL_FAILOVER_TIMEOUT: 60000
  deploy:
    mode: replicated
    replicas: 3

注意主机名属性-使用此属性和新的dns_lookup函数redis可以注释IP地址

您无需执行任何手动操作-服务地址不会更改,并且哨兵将为任何Redis节点正确设置角色

本文链接:https://www.f2er.com/3161586.html

大家都在问