Redis에서 PSYNC가 실패했을 때, SYNC로 동기화가 진행되는지 궁금증이 들었고, 이를 실습하는 과정에서 알게 된 내용을 정리하기 위해 글을 작성하게 되었습니다.
PSYNC에 대한 개념은 해당 포스팅을 참조해주세요. 📝
1. 실습 준비
전체 실습은 Docker Compose를 사용해 6개의 Redis 노드를 구성하고, PSYNC가 실패했을 때 SYNC가 발생하는지 확인하는 방식으로 진행됩니다.
1-1. directory
먼저, 실습을 위한 디렉토리를 생성합니다. 아래와 같이 docker-compose.yml 과 redis.conf 파일을 준비합니다.
1
2
3
4
5
| .
├── docker-compose.yml
└── redis.conf
1 directory, 2 files
|
1-2. docker-compose.yml
도커 컴포즈 파일을 작성하여 6개의 Redis 노드를 구성합니다. 각 노드는 서로 다른 포트에서 실행되며, 클러스터 모드로 설정됩니다.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
| # docker-compose.yml
version: '3.8'
services:
redis-node-0:
image: redis:7
container_name: redis-node-0
ports:
- "7010:7000"
volumes:
- ./redis.conf:/usr/local/etc/redis/redis.conf
command: ["redis-server", "/usr/local/etc/redis/redis.conf", "--port", "7000"]
redis-node-1:
image: redis:7
container_name: redis-node-1
ports:
- "7001:7001"
volumes:
- ./redis.conf:/usr/local/etc/redis/redis.conf
command: ["redis-server", "/usr/local/etc/redis/redis.conf", "--port", "7001"]
redis-node-2:
image: redis:7
container_name: redis-node-2
ports:
- "7002:7002"
volumes:
- ./redis.conf:/usr/local/etc/redis/redis.conf
command: ["redis-server", "/usr/local/etc/redis/redis.conf", "--port", "7002"]
redis-node-3:
image: redis:7
container_name: redis-node-3
ports:
- "7003:7003"
volumes:
- ./redis.conf:/usr/local/etc/redis/redis.conf
command: ["redis-server", "/usr/local/etc/redis/redis.conf", "--port", "7003"]
redis-node-4:
image: redis:7
container_name: redis-node-4
ports:
- "7004:7004"
volumes:
- ./redis.conf:/usr/local/etc/redis/redis.conf
command: ["redis-server", "/usr/local/etc/redis/redis.conf", "--port", "7004"]
redis-node-5:
image: redis:7
container_name: redis-node-5
ports:
- "7005:7005"
volumes:
- ./redis.conf:/usr/local/etc/redis/redis.conf
command: ["redis-server", "/usr/local/etc/redis/redis.conf", "--port", "7005"]
|
1-3. redis.conf
Redis 클러스터 설정을 위한 redis.conf 파일을 작성합니다. 이 파일은 클러스터 모드를 활성화하고, 노드 간의 통신을 설정합니다.
1
2
3
4
5
6
| # redis.conf
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
loglevel verbose
|
2. 실습
실습은 크게 데이터 저장, 마스터 노드 중지, PSYNC/SYNC 로그 확인의 세 단계로 진행됩니다.
- 데이터 저장
- 마스터 노드 중지
- 마스터 노드 복구 및 PSYNC/SYNC 로그 확인
2-1. 데이터 저장
먼저 클러스터에 데이터를 저장합니다.
1
| $ docker exec -it redis-node-0 redis-cli -p 7000 -c set psync_key "testing_psync"
|
데이터 정상 저장 여부를 확인 합니다.
1
| $ docker exec -it redis-node-0 redis-cli -p 7000 -c get psync_key
|
2-2. 마스터 노드 중지
특정 마스터(redis-node-0)의 노드를 중지 합니다.
1
| $ docker stop redis-node-0
|
클러스터 노드 상태를 다른 노드에서 확인 합니다.
1
| $ docker exec -it redis-node-4 redis-cli -p 7004 cluster nodes
|
1
2
3
4
5
6
| a2174d976880f6d0e18d0ceb385e034c001aad62 172.23.0.3:7000@17000 slave 5ff792fa9f23c6415afef24e709895b7f0f2e037 1750587525542 1750587523000 7 connected
ba50503d83a224267351c4e83e12e25122153358 172.23.0.6:7001@17001 master - 0 1750587527070 2 connected 5461-10922
78c4f9acc89f10341950e7b8416dd4b6a20f196c 172.23.0.2:7005@17005 slave ba50503d83a224267351c4e83e12e25122153358 0 1750587525000 2 connected
8391909a0cdd2f5f396abff507eb2b10202b8f46 172.23.0.5:7002@17002 master - 0 1750587526000 3 connected 10923-16383
be1f37458b210cdfd4f5248401800528218711ec 172.23.0.4:7003@17003 slave 8391909a0cdd2f5f396abff507eb2b10202b8f46 0 1750587526049 3 connected
5ff792fa9f23c6415afef24e709895b7f0f2e037 172.23.0.7:7004@17004 myself,master - 0 0 7 connected 0-5460
|
2-3. 마스터 노드 복구 및 PSYNC/SYNC 로그 확인
중지했던 마스터 노드(redis-node-0)를 재시작 합니다.
1
| $ docker start redis-node-0
|
마스터 노드에서 PSYNC/SYNC 로그를 확인 합니다.
1
| $ docker logs redis-node-0 | grep -Ei 'psync|sync|replica'
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
| $ docker logs redis-node-0 | grep -Ei 'psync|sync|replica'
1:M 21 Jun 2025 23:48:55.231 * Replica 172.23.0.7:7004 asks for synchronization
1:M 21 Jun 2025 23:48:55.231 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '39ba0c3263fd29bcec9aece97c634d1250b60dba', my replication IDs are '58d5b6872a3e8757ec747cc332258216e2ea76d2' and '0000000000000000000000000000000000000000')
1:M 21 Jun 2025 23:48:55.231 * Replication backlog created, my new replication IDs are 'c7b61223f5ece8b771133ce02ecc83759dbea18b' and '0000000000000000000000000000000000000000'
1:M 21 Jun 2025 23:48:55.231 * Delay next BGSAVE for diskless SYNC
1:M 21 Jun 2025 23:49:00.599 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 21 Jun 2025 23:49:00.600 * Diskless rdb transfer, done reading from pipe, 1 replicas still up.
1:M 21 Jun 2025 23:49:00.605 * Streamed RDB transfer with replica 172.23.0.7:7004 succeeded (socket). Waiting for REPLCONF ACK from replica to enable streaming
1:M 21 Jun 2025 23:49:00.605 * Synchronization with replica 172.23.0.7:7004 succeeded
1:M 21 Jun 2025 23:49:06.268 * 1 of 1 replicas are in sync when shutting down.
1:M 21 Jun 2025 23:49:06.268 * Calling fsync() on the AOF file.
1:M 21 Jun 2025 23:49:13.911 * Configuration change detected. Reconfiguring myself as a replica of 5ff792fa9f23c6415afef24e709895b7f0f2e037 ()
1:S 21 Jun 2025 23:49:13.911 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 21 Jun 2025 23:49:13.911 * MASTER <-> REPLICA sync started
1:S 21 Jun 2025 23:49:13.915 * Non blocking connect for SYNC fired the event.
1:S 21 Jun 2025 23:49:13.915 * Master replied to PING, replication can continue...
1:S 21 Jun 2025 23:49:13.915 * Trying a partial resynchronization (request 05d7c3ae1451ce9a8908597d50657a7177f76833:1).
1:S 21 Jun 2025 23:49:18.218 * Full resync from master: 7278525adcba750aeb7e9b283f5dab898cca0012:85
1:S 21 Jun 2025 23:49:18.221 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
1:S 21 Jun 2025 23:49:18.221 * MASTER <-> REPLICA sync: Flushing old data
1:S 21 Jun 2025 23:49:18.222 * MASTER <-> REPLICA sync: Loading DB in memory
1:S 21 Jun 2025 23:49:18.227 * MASTER <-> REPLICA sync: Finished with success
1:S 22 Jun 2025 00:49:08.347 * MASTER <-> REPLICA sync started
1:S 22 Jun 2025 00:49:08.348 * Non blocking connect for SYNC fired the event.
1:S 22 Jun 2025 00:49:08.349 * Master replied to PING, replication can continue...
1:S 22 Jun 2025 00:49:08.349 * Trying a partial resynchronization (request 7278525adcba750aeb7e9b283f5dab898cca0012:4440).
1:S 22 Jun 2025 00:49:08.349 * Successful partial resynchronization with master.
1:S 22 Jun 2025 00:49:08.349 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
1:S 22 Jun 2025 01:46:32.341 * MASTER <-> REPLICA sync started
1:S 22 Jun 2025 01:46:32.341 * Non blocking connect for SYNC fired the event.
1:S 22 Jun 2025 01:46:32.341 * Master replied to PING, replication can continue...
1:S 22 Jun 2025 01:46:32.342 * Trying a partial resynchronization (request 7278525adcba750aeb7e9b283f5dab898cca0012:7688).
1:S 22 Jun 2025 01:46:32.342 * Successful partial resynchronization with master.
1:S 22 Jun 2025 01:46:32.342 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
1:S 22 Jun 2025 02:06:33.656 * MASTER <-> REPLICA sync started
1:S 22 Jun 2025 02:06:33.656 * Non blocking connect for SYNC fired the event.
1:S 22 Jun 2025 02:06:33.657 * Master replied to PING, replication can continue...
1:S 22 Jun 2025 02:06:33.657 * Trying a partial resynchronization (request 7278525adcba750aeb7e9b283f5dab898cca0012:7730).
1:S 22 Jun 2025 02:06:33.657 * Successful partial resynchronization with master.
1:S 22 Jun 2025 02:06:33.657 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
1:S 22 Jun 2025 07:21:48.442 * MASTER <-> REPLICA sync started
1:S 22 Jun 2025 07:21:48.442 * Non blocking connect for SYNC fired the event.
1:S 22 Jun 2025 07:21:48.445 * Master replied to PING, replication can continue...
1:S 22 Jun 2025 07:21:48.445 * Trying a partial resynchronization (request 7278525adcba750aeb7e9b283f5dab898cca0012:32202).
1:S 22 Jun 2025 07:21:48.446 * Successful partial resynchronization with master.
1:S 22 Jun 2025 07:21:48.446 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
1:S 22 Jun 2025 07:26:10.164 * MASTER <-> REPLICA sync started
1:S 22 Jun 2025 07:26:10.164 * Non blocking connect for SYNC fired the event.
1:S 22 Jun 2025 07:26:10.164 * Master replied to PING, replication can continue...
1:S 22 Jun 2025 07:26:10.164 * Trying a partial resynchronization (request 7278525adcba750aeb7e9b283f5dab898cca0012:32272).
1:S 22 Jun 2025 07:26:10.165 * Successful partial resynchronization with master.
1:S 22 Jun 2025 07:26:10.165 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
1:S 22 Jun 2025 10:18:45.012 * Calling fsync() on the AOF file.
1:S 22 Jun 2025 10:20:04.234 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 22 Jun 2025 10:20:04.234 * MASTER <-> REPLICA sync started
1:S 22 Jun 2025 10:20:04.234 * Non blocking connect for SYNC fired the event.
1:S 22 Jun 2025 10:20:04.237 * Master replied to PING, replication can continue...
1:S 22 Jun 2025 10:20:04.237 * Trying a partial resynchronization (request 4c7eb4d3892748587bb03365f97e5ff5ccbc6324:1).
1:S 22 Jun 2025 10:20:09.229 * Full resync from master: 7278525adcba750aeb7e9b283f5dab898cca0012:46370
1:S 22 Jun 2025 10:20:09.231 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
1:S 22 Jun 2025 10:20:09.234 * MASTER <-> REPLICA sync: Flushing old data
1:S 22 Jun 2025 10:20:09.234 * MASTER <-> REPLICA sync: Loading DB in memory
1:S 22 Jun 2025 10:20:09.236 * MASTER <-> REPLICA sync: Finished with success
|
레플리카 노드에서 PSYNC/SYNC 로그를 확인 합니다.
1
| $ docker logs redis-node-4 | grep -Ei 'psync|sync|replica'
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
| 1:S 21 Jun 2025 23:48:55.231 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 21 Jun 2025 23:48:55.231 * MASTER <-> REPLICA sync started
1:S 21 Jun 2025 23:48:55.231 * Non blocking connect for SYNC fired the event.
1:S 21 Jun 2025 23:48:55.231 * Master replied to PING, replication can continue...
1:S 21 Jun 2025 23:48:55.231 * Trying a partial resynchronization (request 39ba0c3263fd29bcec9aece97c634d1250b60dba:1).
1:S 21 Jun 2025 23:48:55.231 - Client closed connection id=3 addr=172.23.0.3:57492 laddr=172.23.0.7:7004 fd=16 name= age=4 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=1024 rbp=633 obl=0 oll=0 omem=0 tot-mem=22400 events=r cmd=cluster|replicate user=default redir=-1 resp=2 lib-name= lib-ver=
1:S 21 Jun 2025 23:49:00.599 * Full resync from master: c7b61223f5ece8b771133ce02ecc83759dbea18b:71
1:S 21 Jun 2025 23:49:00.600 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
1:S 21 Jun 2025 23:49:00.601 * MASTER <-> REPLICA sync: Flushing old data
1:S 21 Jun 2025 23:49:00.601 * MASTER <-> REPLICA sync: Loading DB in memory
1:S 21 Jun 2025 23:49:00.604 * MASTER <-> REPLICA sync: Finished with success
1:S 21 Jun 2025 23:49:06.274 * MASTER <-> REPLICA sync started
1:S 21 Jun 2025 23:49:06.274 # Error condition on socket for SYNC: Connection refused
1:S 21 Jun 2025 23:49:06.949 * MASTER <-> REPLICA sync started
1:M 21 Jun 2025 23:49:13.909 * Setting secondary replication ID to c7b61223f5ece8b771133ce02ecc83759dbea18b, valid up to offset: 86. New replication ID is 7278525adcba750aeb7e9b283f5dab898cca0012
1:M 21 Jun 2025 23:49:13.915 * Replica 172.23.0.3:7000 asks for synchronization
1:M 21 Jun 2025 23:49:13.915 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '05d7c3ae1451ce9a8908597d50657a7177f76833', my replication IDs are '7278525adcba750aeb7e9b283f5dab898cca0012' and 'c7b61223f5ece8b771133ce02ecc83759dbea18b')
1:M 21 Jun 2025 23:49:13.915 * Delay next BGSAVE for diskless SYNC
1:M 21 Jun 2025 23:49:16.061 * Clear FAIL state for node a2174d976880f6d0e18d0ceb385e034c001aad62 ():replica is reachable again.
1:M 21 Jun 2025 23:49:18.218 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 21 Jun 2025 23:49:18.221 * Diskless rdb transfer, done reading from pipe, 1 replicas still up.
1:M 21 Jun 2025 23:49:18.228 * Streamed RDB transfer with replica 172.23.0.3:7000 succeeded (socket). Waiting for REPLCONF ACK from replica to enable streaming
1:M 21 Jun 2025 23:49:18.228 * Synchronization with replica 172.23.0.3:7000 succeeded
1:M 22 Jun 2025 00:49:08.347 * Connection with replica 172.23.0.3:7000 lost.
1:M 22 Jun 2025 00:49:08.349 * Replica 172.23.0.3:7000 asks for synchronization
1:M 22 Jun 2025 00:49:08.349 * Partial resynchronization request from 172.23.0.3:7000 accepted. Sending 0 bytes of backlog starting from offset 4440.
1:M 22 Jun 2025 01:46:32.340 * Connection with replica 172.23.0.3:7000 lost.
1:M 22 Jun 2025 01:46:32.342 * Replica 172.23.0.3:7000 asks for synchronization
1:M 22 Jun 2025 01:46:32.342 * Partial resynchronization request from 172.23.0.3:7000 accepted. Sending 0 bytes of backlog starting from offset 7688.
1:M 22 Jun 2025 02:06:33.656 * Connection with replica 172.23.0.3:7000 lost.
1:M 22 Jun 2025 02:06:33.657 * Replica 172.23.0.3:7000 asks for synchronization
1:M 22 Jun 2025 02:06:33.657 * Partial resynchronization request from 172.23.0.3:7000 accepted. Sending 0 bytes of backlog starting from offset 7730.
1:M 22 Jun 2025 07:21:48.443 * Connection with replica 172.23.0.3:7000 lost.
1:M 22 Jun 2025 07:21:48.446 * Replica 172.23.0.3:7000 asks for synchronization
1:M 22 Jun 2025 07:21:48.446 * Partial resynchronization request from 172.23.0.3:7000 accepted. Sending 0 bytes of backlog starting from offset 32202.
1:M 22 Jun 2025 07:26:10.164 * Connection with replica 172.23.0.3:7000 lost.
1:M 22 Jun 2025 07:26:10.164 * Replica 172.23.0.3:7000 asks for synchronization
1:M 22 Jun 2025 07:26:10.164 * Partial resynchronization request from 172.23.0.3:7000 accepted. Sending 0 bytes of backlog starting from offset 32272.
1:M 22 Jun 2025 10:18:45.028 * Connection with replica 172.23.0.3:7000 lost.
1:M 22 Jun 2025 10:20:04.237 * Clear FAIL state for node a2174d976880f6d0e18d0ceb385e034c001aad62 ():replica is reachable again.
1:M 22 Jun 2025 10:20:04.237 * Replica 172.23.0.3:7000 asks for synchronization
1:M 22 Jun 2025 10:20:04.237 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '4c7eb4d3892748587bb03365f97e5ff5ccbc6324', my replication IDs are '7278525adcba750aeb7e9b283f5dab898cca0012' and 'c7b61223f5ece8b771133ce02ecc83759dbea18b')
1:M 22 Jun 2025 10:20:04.237 * Delay next BGSAVE for diskless SYNC
1:M 22 Jun 2025 10:20:09.228 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 22 Jun 2025 10:20:09.231 * Diskless rdb transfer, done reading from pipe, 1 replicas still up.
1:M 22 Jun 2025 10:20:09.237 * Streamed RDB transfer with replica 172.23.0.3:7000 succeeded (socket). Waiting for REPLCONF ACK from replica to enable streaming
1:M 22 Jun 2025 10:20:09.237 * Synchronization with replica 172.23.0.3:7000 succeeded
|
레플리카 노드에서 PSYNC가 실패하고 SYNC로 동기화가 진행된 것을 확인할 수 있습니다.
1
| $ docker exec -it redis-node-0 redis-cli -p 7000 -c get psync_key
|
3. 로그 분석
레플리카 노드가 마스터 노드에 재접속할 때, Redis는 우선 부분 동기화를 시도합니다. 하지만 로그를 보면, Replication ID mismatch 에러와 함께 PSYNC가 거부되고 있습니다. 이는 마스터가 가지고 있는 Replication ID와 레플리카가 요청한 ID가 다르기 때문인데, 이 경우 마스터는 PSYNC 요청을 수락하지 않고 다음 단계로 넘어가게 됩니다.
1
2
| 1:S ... * Trying a partial resynchronization (request 39ba0c3263fd29bcec9aece97c634d1250b60dba:1).
1:M ... * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '39ba0c3263fd29bcec9aece97c634d1250b60dba', my replication IDs are ...)
|
PSYNC가 실패하면 Redis는 자동으로 전체 동기화로 전환합니다. 이 과정에서는 마스터가 전체 DB의 RDB 스냅샷을 생성(BGSAVE) 해 레플리카로 전송합니다. 레플리카는 이를 받아 DB 전체를 다시 로딩하며, 이로써 마스터와 완전히 동일한 데이터 상태로 동기화됩니다. 로그에서 RDB 파일 전송 및 성공적으로 동기화되었다는 메시지가 순서대로 출력됩니다.
1
2
3
4
5
| 1:M ... * Delay next BGSAVE for diskless SYNC
1:M ... * Starting BGSAVE for SYNC with target: replicas sockets
1:M ... * Diskless rdb transfer, done reading from pipe, 1 replicas still up.
1:M ... * Streamed RDB transfer with replica ... succeeded (socket). Waiting for REPLCONF ACK from replica to enable streaming
1:M ... * Synchronization with replica ... succeeded
|
정상적으로 백로그가 남아 있고, Replication ID가 일치하는 상황에서는 레플리카가 PSYNC를 요청했을 때 바로 Successful partial resynchronization with master. 라는 메시지가 출력되며, 변경된 데이터만 받아 동기화가 완료됩니다.
1
2
| 1:S ... * Trying a partial resynchronization (request 7278525adcba750aeb7e9b283f5dab898cca0012:4440).
1:S ... * Successful partial resynchronization with master.
|
4. 정리
Redis에서 PSYNC가 실패하면 SYNC로 동기화가 진행되는 것을 확인했습니다. 진짜 SYNC로 동기화가 진행될까? 라는 의문이 있었지만, 실습을 통해 PSYNC가 실패할 경우 SYNC로 동기화가 진행된다는 것을 명확히 알 수 있었네요.