Android教程網 >> Android技術 >> Android開發 >> 關於android開發 >> 關於Pacemaker集群配置的版本

關於Pacemaker集群配置的版本

編輯：關於android開發

關於Pacemaker集群配置的版本

Pacemaker中CIB有一個由admin_epoch, epoch, num_updates組合而成的版本，當有節點加入集群時，根據版本號的大小，取其中版本最大的作為整個集群的統一配置。

admin_epoch, epoch, num_updates這3者中，admin_epoch通常是不會變的，epoch在每次"配置"變更時累加並把num_updates置0，num_updates在每次"狀態"變更時累加。"配置"指持久的CIB中configuration節點下的內容，包括cluster屬性，node的forever屬性，資源屬性等。"狀態"指node的reboot屬性，node死活，資源是否啟動等動態的東西。

"狀態"通常是可以通過monitor重新獲取的（除非RA腳本設計的有問題），但"配置"出錯可能會導致集群的故障，所以我們更需要關心epoch的變更以及節點加入後對集群配置的影響。尤其一些支持主從架構的RA腳本會動態修改配置(比如mysql的mysql_REPL_INFO
和pgsql的pgsql-data-status)，一旦配置處於不一致狀態可能會導致集群故障。

1. 手冊說明

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Pacemaker_Explained/index.html#idm140225199219024

3.2.Configuration Version When a node joins the cluster, the cluster will perform a check to see who has the best configuration based on the fields below. It then asks the node with the highest (admin_epoch,epoch,num_updates) tuple to replace the configuration on all the nodes - which makes setting them, and setting them correctly, very important.

Table3.1.Configuration Version Properties

FieldDescriptionadmin_epoch Never modified by the cluster. Use this to make the configurations on any inactive nodes obsolete.Never set this value to zero, in such cases the cluster cannot tell the difference between your configuration and the "empty" one used when nothing is found on disk.epoch Incremented every time the configuration is updated (usually by the admin)num_updates Incremented every time the configuration or status is updated (usually by the cluster)

2.實際驗證

2.1 環境

3台機器，srdsdevapp69，srdsdevapp71和srdsdevapp73
OS: CentOS 6.3
Pacemaker: 1.1.14-1.el6 (Build: 70404b0)
Corosync: 1.4.1-7.el6

2.2 基本驗證

0. 初始時epoch="48304"，num_updates="4"

[root@srdsdevapp69 mysql_ha]# cibadmin -Q |grep epoch

1. 更新集群配置導致epoch加1並將num_updates清0

[root@srdsdevapp69 mysql_ha]# crm_attribute --type crm_config -s set1 --name foo1 -v "1"
[root@srdsdevapp69 mysql_ha]# cibadmin -Q |grep epoch

2. 更新值如果和現有值相同epoch不變

[root@srdsdevapp69 mysql_ha]# crm_attribute --type crm_config -s set1 --name foo1 -v "1"
[root@srdsdevapp69 mysql_ha]# cibadmin -Q |grep epoch

3. 更新生命周期為forever的節點屬性也導致epoch加1

[root@srdsdevapp69 mysql_ha]# crm_attribute -N `hostname` -l forever -n foo2 -v 2
[root@srdsdevapp69 mysql_ha]# cibadmin -Q |grep epoch

4. 更新生命周期為reboot的節點屬性導致num_updates加1

[root@srdsdevapp69 mysql_ha]# crm_attribute -N `hostname` -l reboot -n foo3 -v 2
[root@srdsdevapp69 mysql_ha]# cibadmin -Q |grep epoch

2.3 分區驗證

1. 人為造成srdsdevapp69和其它兩個節點的網絡隔離形成分區,分區前的DC(Designated Controller)為srdsdevapp73

[root@srdsdevapp69 mysql_ha]# iptables -A INPUT -j DROP -s srdsdevapp71
[root@srdsdevapp69 mysql_ha]# iptables -A OUTPUT -j DROP -s srdsdevapp71
[root@srdsdevapp69 mysql_ha]# iptables -A INPUT -j DROP -s srdsdevapp73
[root@srdsdevapp69 mysql_ha]# iptables -A OUTPUT -j DROP -s srdsdevapp73

兩個分區上的epoch都沒有變，仍是48306,但srdsdevapp69將自己作為了自己分區的DC 。

分區1(srdsdevapp69) : 未取得QUORUM

[root@srdsdevapp69 mysql_ha]# cibadmin -Q |grep epoch

分區2(srdsdevapp71,srdsdevapp73) : 取得QUORUM

[root@srdsdevapp71 ~]# cibadmin -Q |grep epoch

2. 在srdsdevapp69上做2次配置更新，使其epoch增加2

[root@srdsdevapp69 mysql_ha]# crm_attribute --type crm_config -s set1 --name foo4 -v "1"
[root@srdsdevapp69 mysql_ha]# crm_attribute --type crm_config -s set1 --name foo5 -v "1"
[root@srdsdevapp69 mysql_ha]# cibadmin -Q |grep epoch

3.在srdsdevapp71上做1次配置更新，使其epoch增加1

[root@srdsdevapp71 ~]# crm_attribute --type crm_config -s set1 --name foo6 -v "1"
[root@srdsdevapp71 ~]# cibadmin -Q |grep epoch

4.恢復網絡再檢查集群的配置

[root@srdsdevapp69 mysql_ha]# iptables -F
[root@srdsdevapp69 mysql_ha]# cibadmin -Q |grep epoch
[root@srdsdevapp69 mysql_ha]# crm_attribute --type crm_config -s set1 --name foo5 -q
1
[root@srdsdevapp69 mysql_ha]# crm_attribute --type crm_config -s set1 --name foo4 -q
1
[root@srdsdevapp69 mysql_ha]# crm_attribute --type crm_config -s set1 --name foo6 -q
Error performing operation: No such device or address

可以發現集群采用了srdsdevapp69分區的配置,因為它的版本更大，這時在srdsdevapp71,srdsdevapp73分區上所做的更新丟失了。
這個測試反映了一個問題：取得QUORUM的分區配置可能會被未取得QUORUM的分區配置覆蓋。如果自己開發RA的話，這是一個需要注意的問題。