MySQL 5.7 基於復制線程SQL_Thread加快恢復的嘗試
1. MySQL 數據恢復常用辦法
MySQL恢復的方法一般有三種:
1. 官方推薦的基於全備+binlog , 通常做法是先恢復最近一次的全備,然後通過mysqlbiinlog --start-position --stop-position binlog.000xxx | mysql -uroot -p xxx -S database 恢復到目標數據庫做恢復
2. 基於主從同步恢復數據,通常做法是先恢復最近一次的全備,然後恢復後的實例做slave 掛載到現有的master 上面,通過 start slave sql_thread until master_log_pos 恢復到故障前的一個pos。
現在嘗試第三種恢復方式, 通過原來主庫上面的binlog 把數據都恢復到slave 上。
處理思路:
因為relaylog和binlog本質實際上是一樣的,所以是否可以利用MySQL自身的sql_thread來增量binlog
1)重新初始化一個實例,恢復全量備份文件。
2)找到第一個binlog文件的position,和剩下所有的binlog。
3)將binlog偽裝成relaylog,通過sql thread增量恢復。
應用場景:
1. 最近的一次全備離故障位置比較遠,通過上面兩種方式的恢復時間太慢
2. 雙主keepalived的集群,由於keepalived沒有像MHA 那樣有日誌補全機制,出故障是有可能會有數據丟失的,萬一同步有嚴重的復制延時出現故障切換到slave,這樣數據就不一致,需要做日誌補全
2. 實驗步驟
1. 建立基於主從同步(這裏實驗基於傳統的pos, 其實GTID 也一樣可行)
M1 :
root@localhost:mysql3307.sock [(none)]>select * from restore.t1; +----+------+ | id | c1 | +----+------+ | 1 | 1 | | 2 | 3 | | 3 | 2 | | 4 | 3 | | 5 | 6 | | 6 | 7 | | 7 | 9 | | 10 | NULL | | 11 | 10 | +----+------+ 9 rows in set (0.00 sec)
M2:(slave)
root@localhost:mysql3307.sock [(none)]>select * from restore.t1; +----+------+ | id | c1 | +----+------+ | 1 | 1 | | 2 | 3 | | 3 | 2 | | 4 | 3 | | 5 | 6 | | 6 | 7 | | 7 | 9 | | 10 | NULL | | 11 | 10 | +----+------+ 9 rows in set (0.00 sec)
root@localhost:mysql3307.sock [restore]>show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: m1 Master_User: repl Master_Port: 3307 Connect_Retry: 60 Master_Log_File: 3307-binlog.000002 Read_Master_Log_Pos: 154 Relay_Log_File: M2-relay-bin.000004 Relay_Log_Pos: 371 Relay_Master_Log_File: 3307-binlog.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 154 Relay_Log_Space: 624 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 13307 Master_UUID: afeab8d6-b871-11e7-9b2a-005056b643b3 Master_Info_File: /data/mysql/3307/data/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec)
記錄此時slave 的 relay-log 信息
[root@M2 data]# more M2-relay-bin.index ./M2-relay-bin.000003 ./M2-relay-bin.000004 [root@M2 data]# more relay-log.info 7 ./M2-relay-bin.000004 371 3307-binlog.000002 154 0 0 1
2. 使用sysbench 模擬數據不同步
[root@M1 logs]# mysqladmin create sbtest
[root@M1 sysbench]# sysbench --db-driver=mysql --mysql-host=m1 --mysql-port=3307 --mysql-user=sbtest --mysql-password=‘sbtest‘ /usr/share/sysbench/oltp_common.lua --tables=4 --table-size=100000 --threads=2 --time=60 --report-interval=10 prepare
在主庫導入數據的時候在slave端停止同步,制造數據不一致
root@localhost:mysql3307.sock [mysql]>stop slave
3. 等sysbench執行完,查看主庫的數據和slave 的數據
主庫:
root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest1; +----------+ | count(1) | +----------+ | 100000 | +----------+ 1 row in set (0.05 sec) root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest2; +----------+ | count(1) | +----------+ | 100000 | +----------+ 1 row in set (0.05 sec) root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest3; +----------+ | count(1) | +----------+ | 100000 | +----------+ 1 row in set (0.05 sec) root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest4; +----------+ | count(1) | +----------+ | 100000 | +----------+ 1 row in set (0.05 sec)
slave 端:
root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest4; +----------+ | count(1) | +----------+ | 67550 | +----------+ 1 row in set (0.06 sec) root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest3; +----------+ | count(1) | +----------+ | 70252 | +----------+ 1 row in set (0.04 sec)
可以看到主從不同步。
4. 此時查看slave 的status:
root@localhost:mysql3307.sock [(none)]>show slave status\G *************************** 1. row *************************** Slave_IO_State: Master_Host: m1 Master_User: repl Master_Port: 3307 Connect_Retry: 60 Master_Log_File: 3307-binlog.000002 Read_Master_Log_Pos: 76364214 Relay_Log_File: M2-relay-bin.000004 Relay_Log_Pos: 64490301 Relay_Master_Log_File: 3307-binlog.000002 Slave_IO_Running: No Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 64490084 Relay_Log_Space: 76364861 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 0 Master_UUID: afeab8d6-b871-11e7-9b2a-005056b643b3 Master_Info_File: /data/mysql/3307/data/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec)
由於本地的relay log 沒有執行完畢,為了保證實驗準確性,我們先讓本地的relaylog 執行完 , start slave sql_thread
再次檢查:
*************************** 1. row *************************** Slave_IO_State: Master_Host: m1 Master_User: repl Master_Port: 3307 Connect_Retry: 60 Master_Log_File: 3307-binlog.000002 Read_Master_Log_Pos: 76364214 Relay_Log_File: M2-relay-bin.000005 Relay_Log_Pos: 4 Relay_Master_Log_File: 3307-binlog.000002 Slave_IO_Running: No Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 76364214 Relay_Log_Space: 154 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 0 Master_UUID: afeab8d6-b871-11e7-9b2a-005056b643b3 Master_Info_File: /data/mysql/3307/data/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec)
本地relaylog 已經全部執行完畢,此時記錄最新的relay log 信息:
[root@M2 data]# more relay-log.info
7
./M2-relay-bin.000005
4
3307-binlog.000002
76364214
0
0
1
0
0
1
上面這個信息很重要,說明了從庫執行到主庫的000002 的binlog的76364214 這個位置,我們下面將主庫的binlog 拷貝過來模擬relaylog, 並從這個位置開始恢復
5. 拷貝binlog 到目標端,並模擬成relay log
拷貝前先關閉從庫,並修改cnf (skip-slave-start)讓slave 不會重啟後自動開始復制
[root@M2 data]# ll
total 185248
-rw-r----- 1 root root 461 Oct 24 17:14 3307-binlog.000001
-rw-r----- 1 root root 76364609 Oct 24 17:14 3307-binlog.000002
-rw-r----- 1 root root 203 Oct 24 17:14 3307-binlog.000003
-rw-r----- 1 root root 419 Oct 24 17:14 3307-binlog.000004
-rw-r----- 1 root root 164 Oct 24 17:14 3307-binlog.index
-rw-r----- 1 mysql mysql 56 Oct 24 15:08 auto.cnf
-rw-r----- 1 mysql mysql 4720 Oct 24 17:14 ib_buffer_pool
-rw-r----- 1 mysql mysql 12582912 Oct 24 17:14 ibdata1
-rw-r----- 1 mysql mysql 50331648 Oct 24 17:14 ib_logfile0
-rw-r----- 1 mysql mysql 50331648 Oct 24 17:11 ib_logfile1
-rw-r----- 1 mysql mysql 177 Oct 24 17:14 M2-relay-bin.000005
-rw-r----- 1 mysql mysql 22 Oct 24 17:11 M2-relay-bin.index
-rw-r----- 1 mysql mysql 122 Oct 24 17:14 master.info
drwxr-x--- 2 mysql mysql 4096 Oct 24 15:07 mysql
-rw------- 1 root root 0 Oct 24 15:08 nohup.out
drwxr-x--- 2 mysql mysql 4096 Oct 24 15:07 performance_schema
-rw-r----- 1 mysql mysql 68 Oct 24 17:14 relay-log.info
drwxr-x--- 2 mysql mysql 4096 Oct 24 15:07 restore
drwxr-x--- 2 mysql mysql 4096 Oct 24 16:47 sbtest
drwxr-x--- 2 mysql mysql 12288 Oct 24 15:07 sys
-rw-r----- 1 mysql mysql 24 Oct 24 15:07 xtrabackup_binlog_pos_innodb
-rw-r----- 1 mysql mysql 577 Oct 24 15:07 xtrabackup_info
改名為relay log
[root@M2 data]# cp 3307-binlog.000001 relay.000001 [root@M2 data]# cp 3307-binlog.000002 relay.000002 [root@M2 data]# cp 3307-binlog.000003 relay.000003 [root@M2 data]# cp 3307-binlog.000004 relay.000004
改權限屬性
[root@M2 data]# chown mysql.mysql -R *
修改relay log index 文件,讓系統能識別
[root@M2 data]# cat M2-relay-bin.index ./relay.000001 ./relay.000002 ./relay.000003 ./relay.000004
修改relay log info 文件,告訴系統從哪個位置開始復制
[root@M2 data]# cat relay-log.info 7 ./relay.000002 76364214 3307-binlog.000002 76364214 0 0 1 0 0 1
最後開起sql_thread 進程開始快速恢復
start slave sql_thread
6. 檢查數據是否一致
slave:
oot@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest4; +----------+ | count(1) | +----------+ | 100000 | +----------+ 1 row in set (0.05 sec) root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest3; +----------+ | count(1) | +----------+ | 100000 | +----------+ 1 row in set (0.05 sec)
可以看到slave 已經把缺失的數據都全部恢復了。
MySQL 5.7 基於復制線程SQL_Thread加快恢復的嘗試