Android教程網
  1. 首頁
  2. Android 技術
  3. Android 手機
  4. Android 系統教程
  5. Android 游戲
 Android教程網 >> Android技術 >> Android開發 >> 關於android開發 >> MHA GTID based failover代碼解析

MHA GTID based failover代碼解析

編輯:關於android開發

MHA GTID based failover代碼解析


作為以下文章的補充,說明MHA GTID based failover的處理流程。
http://blog.chinaunix.net/uid-20726500-id-5700631.html

MHA判斷是GTID based failover需要滿足下面3個條件(參考函數get_gtid_status)
所有節點gtid_mode=1
所有節點Executed_Gtid_Set不為空
至少一個節點Auto_Position=1


GTID basedMHA故障切換
  1. MHA::MasterFailover::main()
  2. ->do_master_failover
  3. Phase 1: Configuration Check Phase
  4. -> check_settings:
  5. check_node_version:查看MHA的版本信息
  6. connect_all_and_read_server_status:確認各個node的MySQL實例是否可以連接
  7. get_dead_servers/get_alive_servers/get_alive_slaves:double check各個node的死活狀態
  8. start_sql_threads_if:查看Slave_SQL_Running是否為Yes,若不是則啟動SQL thread

  9. Phase 2: Dead Master Shutdown Phase:對於我們來說,唯一的作用就是stop IO thread
  10. -> force_shutdown($dead_master):
  11. stop_io_thread:所有slave的IO thread stop掉(將stop掉master)
  12. force_shutdown_internal(實際上就是執行配置文件中的master_ip_failover_script/shutdown_script,若無則不執行):
  13. master_ip_failover_script:如果設置了VIP,則首先切換VIP
  14. shutdown_script:如果設置了shutdown腳本,則執行

  15. Phase 3: Master Recovery Phase
  16. -> Phase 3.1: Getting Latest Slaves Phase(取得latest slave)
  17. read_slave_status:取得各個slave的binlog file/position
  18. check_slave_status:調用"SHOW SLAVE STATUS"來取得slave的如下信息:
  19. Slave_IO_State, Master_Host,
  20. Master_Port, Master_User,
  21. Slave_IO_Running, Slave_SQL_Running,
  22. Master_Log_File, Read_Master_Log_Pos,
  23. Relay_Master_Log_File, Last_Errno,
  24. Last_Error, Exec_Master_Log_Pos,
  25. Relay_Log_File, Relay_Log_Pos,
  26. Seconds_Behind_Master, Retrieved_Gtid_Set,
  27. Executed_Gtid_Set, Auto_Position
  28. Replicate_Do_DB, Replicate_Ignore_DB, Replicate_Do_Table,
  29. Replicate_Ignore_Table, Replicate_Wild_Do_Table,
  30. Replicate_Wild_Ignore_Table
  31. identify_latest_slaves:
  32. 通過比較各個slave中的Master_Log_File/Read_Master_Log_Pos,來找到latest的slave
  33. identify_oldest_slaves:
  34. 通過比較各個slave中的Master_Log_File/Read_Master_Log_Pos,來找到oldest的slave

  35. -> Phase 3.2: Determining New Master Phase
  36. get_most_advanced_latest_slave:找到(Relay_Master_Log_File,Exec_Master_Log_Pos)最靠前的Slave

  37. select_new_master:選出新的master節點
  38. If preferred node is specified, one of active preferred nodes will be new master.
  39. If the latest server behinds too much (i.e. stopping sql thread for online backups),
  40. we should not use it as a new master, we should fetch relay log there. Even though preferred
  41. master is configured, it does not become a master if it's far behind.
    get_candidate_masters:
    就是配置文件中配置了candidate_master>0的節點
    get_bad_candidate_masters:
    # The following servers can not be master:
    # - dead servers
    # - Set no_master in conf files (i.e. DR servers)
    # - log_bin is disabled
    # - Major version is not the oldest
    # - too much replication delay(slave與master的binlog position差距大於100000000)
    Searching from candidate_master slaves which have received the latest relay log events
    if NOT FOUND:
    Searching from all candidate_master slaves
    if NOT FOUND:
    Searching from all slaves which have received the latest relay log events
    if NOT FOUND:
    Searching from all slaves

    -> Phase 3.3: Phase 3.3: New Master Recovery Phase
    recover_master_gtid_internal:
    wait_until_relay_log_applied
    stop_slave
    如果new master不是擁有最新relay的Slave
    $latest_slave->wait_until_relay_log_applied:等待直到最新relay的Slave上Exec_Master_Log_Pos等於Read_Master_Log_Pos
    change_master_and_start_slave( $target, $latest_slave)
    wait_until_in_sync( $target, $latest_slave )
    save_from_binlog_server:
    遍歷所有binary server,執行save_binary_logs --command=save獲取後面的binlog
    apply_binlog_to_master:
    應用從binary server上獲取的binlog(如果有的話)
    如果設置了master_ip_failover_script,調用$master_ip_failover_script --command=start進行啟用vip
    如果未設置skip_disable_read_only,設置read_only=0

    Phase 4: Slaves Recovery Phase
    recover_slaves_gtid_internal
    -> Phase 4.1: Starting Slaves in parallel
    對所有Slave執行change_master_and_start_slave
    如果設置了wait_until_gtid_in_sync,通過"SELECT WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS(?,0)"等待Slave數據同步

    Phase 5: New master cleanup phase
    reset_slave_on_new_master
    清理New Master其實就是重置slave info,即取消原來的Slave信息。至此整個Master故障切換過程完成



啟用GTID時的在線切換流程和不啟用GTID時一樣(唯一不同的是執行的change master語句),所以省略。

  1. 上一頁:
  2. 下一頁:
熱門文章
閱讀排行版
Copyright © Android教程網 All Rights Reserved