我使用心脏起搏器和cor产建立了一个有3个节点的集群。当我有意将网络电缆拉回其中一个节点B (作为灾难恢复测试)时,节点A或C会接收VIP,但当我在某个时候将电缆放回B时,VIP就会切换到B,这不应该是这样的。
我希望A或C能留住VIP,下面是我的起搏器配置
configure
primitive baseos-ping-check ocf:pacemaker:ping params host_list="1.2.3.4" multiplier="1000" dampen="0" attempts="2" \
op start interval="0s" timeout="60s" \
op monitor interval="2s" timeout="60s" \
op stop interval="0s" timeout="60s" on-fail="ignore"
primitive baseos-vip-master ocf:heartbeat:IPaddr2 \
params ip="192.67.23.145" iflabel="MR" cidr_netmask="255.255.255.0" \
op start interval="0s" \
op monitor interval="10s" \
op stop interval="0s"
clone cl_baseos-ping-check baseos-ping-check meta interleave="true"
location loc-vip-master vip-master \
rule $id="loc-vip-master-rule" $role="master" 100: #uname eq ECS01 \
rule $id="loc--vip-master-rule-0" $role="master" -inf: not_defined pingd or pingd lte 0
property expected-quorum-votes="1"
property stonith-enabled="false"
property maintenance-mode="false"
property cluster-recheck-interval="5min"
property default-action-timeout="60s"
property pe-error-series-max="500"
property pe-input-series-max="500"
property pe-warn-series-max="500"
property no-quorum-policy="ignore"
property dc-version="1.1.16-94ff4df"
property cluster-infrastructure="corosync"
rsc_defaults resource-stickiness="150"
rsc_defaults migration-threshold="3"
commit
quit我的cor产c配置看起来如下:
quorum {
provider: corosync_votequorum
expected_votes : 3
}
totem {
version: 2
# How long before declaring a token lost (ms)
token: 3000
# How many token retransmits before forming a new configuration
token_retransmits_before_loss_const: 10
# How long to wait for join messages in the membership protocol (ms)
join: 60
# How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)
consensus: 3600
# Turn off the virtual synchrony filter
vsftype: none
# Number of messages that may be sent by one processor on receipt of the token
max_messages: 20
# Limit generated nodeids to 31-bits (positive signed integers)
clear_node_high_bit: yes
# Disable encryption
secauth: on
# How many threads to use for encryption/decryption
threads: 0
# Optionally assign a fixed node id (integer)
# nodeid: 1234
# This specifies the mode of redundant ring, which may be none, active, or passive.
rrp_mode: none
interface {
# The following values need to be set based on your environment
ringnumber: 0
bindnetaddr: 10.98.4.0
#mcastaddr: 0.0.0.0
mcastport: 5876
member {
memberaddr: 10.98.4.103
}
member {
memberaddr: 10.98.4.173
}
}
transport: udpu
}
amf {
mode: disabled
}
service {
# Load the Pacemaker Cluster Resource Manager
ver: 0
name: pacemaker
}
aisexec {
user: root
group: root
}
logging {
fileline: off
to_stderr: yes
to_logfile: no
to_syslog: yes
syslog_facility: daemon
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
tags: enter|leave|trace1|trace2|trace3|trace4|trace6
}
}我的cib.xml如下所示:
我上面描述的场景只有在我拉一个节点的网络电缆使它离线时才会发生,但是如果我重新启动节点(即B),那么VIP就会粘附在当前节点上,即A或C。
我注意到,当我将Node B的网络电缆放回时,IPaddr2资源正在调用失败的findif,因为我没有使用nic名称参数,但我确实提供了cidr_netmask,所以理想情况下,findif应该解析节点B的ip地址。
有什么办法可以避免findif失败吗?
发布于 2019-09-13 17:59:58
正如我们在您的问题下的注释中所指出的:当节点重新加入集群时,会发现VIP在多个节点上运行,因此集群必须恢复服务(在任何地方停止VIP,然后启动它),而它恰好是选择节点B。
在生产集群中,您将使用栅栏/STONITH,而不会忽略仲裁。当您在该配置中将节点B从网络中拔出时,带外STONITH代理将强制关闭节点B,从而导致节点B以“新状态”重新加入群集,没有运行任何服务,并且VIP不会故障回节点B。
https://serverfault.com/questions/983932
复制相似问题