MHA故障切換和在線切換的代碼解析
前段時間我的同事沈龍星整理了一下MHA故障切換和在線切換的代碼流程,在征得其同意後,在此轉發。以下是正文
本文是以MySQL5.5為基礎的,因此沒有涉及到gtid相關內容。MHA的主從切換過程分為failover和rotate兩種,前者適用於原Master down的情況,後者是在在線切換的情況下使用。下面分別講解
failover的處理流程
- MHA::MasterFailover::main()
- ->do_master_failover
- Phase 1: Configuration Check Phase
- -> check_settings:
- check_node_version:查看MHA的版本信息
- connect_all_and_read_server_status:確認各個node的MySQL實例是否可以連接
- get_dead_servers/get_alive_servers/get_alive_slaves:double check各個node的死活狀態
- start_sql_threads_if:查看Slave_SQL_Running是否為Yes,若不是則啟動SQL thread
- Phase 2: Dead Master Shutdown Phase:對於我們來說,唯一的作用就是stop IO thread
- -> force_shutdown($dead_master):
- stop_io_thread:所有slave的IO thread stop掉(將stop掉master)
- force_shutdown_internal(實際上就是執行配置文件中的master_ip_failover_script/shutdown_script,若無則不執行):
- master_ip_failover_script:如果設置了VIP,則首先切換VIP
- shutdown_script:如果設置了shutdown腳本,則執行
- Phase 3: Master Recovery Phase
- -> Phase 3.1: Getting Latest Slaves Phase(取得latest slave)
- read_slave_status:取得各個slave的binlog file/position
- check_slave_status:調用"SHOW SLAVE STATUS"來取得slave的如下信息:
- Slave_IO_State, Master_Host,
- Master_Port, Master_User,
- Slave_IO_Running, Slave_SQL_Running,
- Master_Log_File, Read_Master_Log_Pos,
- Relay_Master_Log_File, Last_Errno,
- Last_Error, Exec_Master_Log_Pos,
- Relay_Log_File, Relay_Log_Pos,
- Seconds_Behind_Master, Retrieved_Gtid_Set,
- Executed_Gtid_Set, Auto_Position
- Replicate_Do_DB, Replicate_Ignore_DB, Replicate_Do_Table,
- Replicate_Ignore_Table, Replicate_Wild_Do_Table,
- Replicate_Wild_Ignore_Table
- identify_latest_slaves:
- 通過比較各個slave中的Master_Log_File/Read_Master_Log_Pos,來找到latest的slave
- identify_oldest_slaves:
- 通過比較各個slave中的Master_Log_File/Read_Master_Log_Pos,來找到oldest的slave
- -> Phase 3.2: Saving Dead Master's Binlog Phase:
- save_master_binlog:
- 如果dead master可以ssh連接,則走如下分支:
- save_master_binlog_internal:(使用node節點的save_binary_logs腳本在dead master上做拷貝)
- save_binary_logs --command=save --start_file=mysql-bin.000281 --start_pos=107 --binlog_dir=/opt/mysql/data/binlog --output_file=/opt/mha/log/saved_master_binlog_from_10.27.177.245_3306_20160108211857.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55
- generate_diff_binary_log:
- concat_all_binlogs_from:
- dump_binlog:就是將binlog文件dump到target文件中,用的就是binmode read
- dump_binlog_header_fde:從0讀到position-1
- dump_binlog_from_pos:從position開始,dump binlog file到target file
- file_copy:
- 文件拷貝,是將上述生成的binlog文件拷貝到manage節點的manager_workdir目錄下
- 如果dead master無法ssh登錄,則master上未同步到slave的txn丟失
- -> Phase 3.3: Determining New Master Phase
- find_latest_base_slave:
- find_latest_base_slave_internal:
- pos_cmp( $oldest_mlf, $oldest_mlp, $latest_mlf, $latest_mlp )
- 判斷latest/oldest slave的binlog位置是不是相同,若相同則不需要同步relay log
- apply_diff_relay_logs --command=find --latest
- 查看latest slave中是否有oldest缺少的relay log,若無則繼續,否則failover失敗
- 查找的方法很簡單,就是逆序的讀latest slave的relay log文件,一直找到file/position為止
- select_new_master:選出新的master節點
- If preferred node is specified, one of active preferred nodes will be new master.
- If the latest server behinds too much (i.e. stopping sql thread for online backups),
- we should not use it as a new master, we should fetch relay log there. Even though preferred
- master is configured, it does not become a master if it's far behind.
- get_candidate_masters:
- 就是配置文件中配置了candidate_master>0的節點
- get_bad_candidate_masters:
- # The following servers can not be master:
- # - dead servers
- # - Set no_master in conf files (i.e. DR servers)
- # - log_bin is disabled
- # - Major version is not the oldest
- # - too much replication delay(slave與master的binlog position差距大於100000000)
- Searching from candidate_master slaves which have received the latest relay log events
- if NOT FOUND:
- Searching from all candidate_master slaves
- if NOT FOUND:
- Searching from all slaves which have received the latest relay log events
- if NOT FOUND:
- Searching from all slaves
- -> Phase 3.4: New Master Diff Log Generation Phase
- recover_relay_logs:
- 判斷new master是不是latest slave,若不是則使用apply_diff_relay_logs --命令生成差分log,
- 並發送到新new master
- recover_master_internal:
- 將3.2中生成的daed master上的binlog發送到new master
- -> Phase 3.5: Master Log Apply Phase
- recover_slave:
- apply_diff:
- 0. wait_until_relay_log_applied,等待new master將relaylog執行完
- 1. 判斷Exec_Master_Log_Pos == Read_Master_Log_Pos,
- 如果不相等則使用save_binary_logs --command=save生成差分log
- 2. 調用apply_diff_relay_logs命令,讓new master進行recover.其中:
- 2.1 recover的log分為三部分:
- exec_diff:Exec_Master_Log_Pos和Read_Master_Log_Pos的差分
- read_diff:new master與lastest slave的relay log的差分
- binlog_diff:lastest slave與daed master之間的binlog差分
- 實際上apply_diff_relay_logs就是調用mysqlbinlog command進行recover
- //如果設置了vip,則需要調用master_ip_failover_script進行vip的failover
- Phase 4: Slaves Recovery Phase
- -> Phase 4.1: Starting Parallel Slave Diff Log Generation Phase
- 生成Slave與New Slave之間的差異日志,並將該日志拷貝到各Slave的工作目錄下。
- -> Phase 4.2: Starting Parallel Slave Log Apply Phase
- recover_slave:
- 對各個slave進行恢復,同Phase3.5
- change_master_and_start_slave:
- 通過CHANGE MASTER TO命令將這些Slave指向新的New Master,最後開始復制(start slave)
- Phase 5: New master cleanup phase
- reset_slave_on_new_master
- 清理New Master其實就是重置slave info,即取消原來的Slave信息。至此整個Master故障切換過程完成
rotate的處理過程
- MHA::MasterRotate::main()
-> do_master_online_switch:
Phase 1: Configuration Check Phase
-> identify_orig_master
connect_all_and_read_server_status:
connect_check:首先進行connect check,確保各個server的MySQL服務都正常
connect_and_get_status:獲取MySQL實例的server_id/mysql_version/log_bin..等信息
這一步還有一個重要的作用,是獲取當前的master節點。通過執行show slave status,
如果輸出為空,說明當前節點是master節點。
validate_current_master:取得master節點的信息,並判斷配置的正確性
check是否有server down,若有則退出rotate
check master alive or not,若dead則退出rotate
check_repl_priv:
查看用戶是否有replication的權限
獲取monitor_advisory_lock,以保證當前沒有其他的monitor進程在master上運行
執行:SELECT GET_LOCK('MHA_Master_High_Availability_Monitor', ?) AS Value
獲取failover_advisory_lock,以保證當前沒有其他的failover進程在slave上運行
執行:SELECT GET_LOCK('MHA_Master_High_Availability_Failover', ?) AS Value
check_replication_health:
執行:SHOW SLAVE STATUS來判斷如下狀態:current_slave_position/has_replication_problem
其中,has_replication_problem具體check如下內容:IO線程/SQL線程/Seconds_Behind_Master(1s)
get_running_update_threads:
使用show processlist來查詢當前有沒有執行update的線程存在,若有則退出switch
-> identify_new_master
set_latest_slaves:當前的slave節點都是latest slave
select_new_master:選出新的master節點
If preferred node is specified, one of active preferred nodes will be new master.
If the latest server behinds too much (i.e. stopping sql thread for online backups),
we should not use it as a new master, we should fetch relay log there. Even though preferred
master is configured, it does not become a master if it's far behind.
get_candidate_masters:
就是配置文件中配置了candidate_master>0的節點
get_bad_candidate_masters:
# The following servers can not be master:
# - dead servers
# - Set no_master in conf files (i.e. DR servers)
# - log_bin is disabled
# - Major version is not the oldest
# - too much replication delay(slave與master的binlog position差距大於100000000)
Searching from candidate_master slaves which have received the latest relay log events
if NOT FOUND:
Searching from all candidate_master slaves
if NOT FOUND:
Searching from all slaves which have received the latest relay log events
if NOT FOUND:
Searching from all slaves
Phase 2: Rejecting updates Phase
reject_update:lock table來reject write binlog
如果MHA的配置文件中設置了"master_ip_online_change_script"參數,則執行該腳本來disable writes on the current master
該腳本在使用了vip的時候才需要設置
reconnect:確保當前與master的連接正常
lock_all_tables:執行FLUSH TABLES WITH READ LOCK,來lock table
check_binlog_stop:連續兩次show master status,來判斷寫binlog是否已經停止
read_slave_status:
get_alive_slaves:
check_slave_status:調用"SHOW SLAVE STATUS"來取得slave的如下信息:
Slave_IO_State, Master_Host,
Master_Port, Master_User,
Slave_IO_Running, Slave_SQL_Running,
Master_Log_File, Read_Master_Log_Pos,
Relay_Master_Log_File, Last_Errno,
Last_Error, Exec_Master_Log_Pos,
Relay_Log_File, Relay_Log_Pos,
Seconds_Behind_Master, Retrieved_Gtid_Set,
Executed_Gtid_Set, Auto_Position
Replicate_Do_DB, Replicate_Ignore_DB, Replicate_Do_Table,
Replicate_Ignore_Table, Replicate_Wild_Do_Table,
Replicate_Wild_Ignore_Table
switch_master:
switch_master_internal:
master_pos_wait:調用select master_pos_wait函數,等待主從同步完成
get_new_master_binlog_position:執行'show master status'
Allow write access on the new master:
調用master_ip_online_change_script --command=start ...,將vip指向new master
disable_read_only:
在新master上執行:SET GLOBAL read_only=0
switch_slaves:
switch_slaves_internal:
change_master_and_start_slave
change_master:
start_slave:
unlock_tables:在orig master上執行unlock table
Phase 5: New master cleanup phase
reset_slave_on_new_master
release_failover_advisory_lock