KingbaseES R6 集群手工配置VIP案例

经常有用户问，V8R6集群搭建时没有配置VIP，搭建完成后，如何添加VIP？以下向大家介绍下手动添加VIP 的过程。

一、操作系统环境

操作系统（UOS)：

root@uos01:~# cat /etc/issue

Uniontech OS Server 20 Enterprise \n \l



数据库：

test=# select version();

                                                       version

-------------------------------------------------------------------------------

 KingbaseES V008R006C003B0010 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit

(1 row)

二、集群架构信息

1、前期部署

前期部署时，没有配置VIP

2、查看集群节点状态信息

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./repmgr cluster show

 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+---------+---------+-----------+----------+----------+----------+----------+

 1  | node238 | primary | * running |          | default  | 100      | 1        | host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

 2  | node239 | standby |   running | node238  | default  | 100      | 1        | host=192.168.7.239 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

3、查看repmgr.conf文件

kingbase@uos01:~/cluster/R6HA/kha/kingbase/etc$ cat repmgr.conf

on_bmj=off

node_id=1

node_name='node238'

promote_command='/home/kingbase/cluster/R6HA/kha/kingbase/bin/repmgr  standby promote -f /home/kingbase/cluster/R6HA/kha/kingbase/etc/repmgr.conf'

follow_command='/home/kingbase/cluster/R6HA/kha/kingbase/bin/repmgr  standby follow  -f /home/kingbase/cluster/R6HA/kha/kingbase/etc/repmgr.conf -W --upstream-node-id=%n'

conninfo='host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3'

log_file='/home/kingbase/cluster/R6HA/kha/kingbase/hamgr.log'

data_directory='/home/kingbase/cluster/R6HA/kha/kingbase/data'

sys_bindir='/home/kingbase/cluster/R6HA/kha/kingbase/bin'

ssh_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22'

reconnect_attempts=3

reconnect_interval=5

failover='automatic'

recovery='manual'

monitoring_history='no'

trusted_servers='192.168.7.1'

synchronous='quorum'

repmgrd_pid_file='/home/kingbase/cluster/R6HA/kha/kingbase/hamgrd.pid'

ping_path='/usr/bin'

===从以上配置文件获知，文件中没有virtual_ip的配置项===

4、sys_monitor.sh启动集群

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./sys_monitor.sh restart

2021-03-01 12:07:25 Ready to stop all DB ...

Service process "node_export" was killed at process 12391

Service process "postgres_ex" was killed at process 12392

Service process "node_export" was killed at process 5229

Service process "postgres_ex" was killed at process 5230

2021-03-01 12:07:28 begin to stop repmgrd on "[192.168.7.238]".

2021-03-01 12:07:29 repmgrd on "[192.168.7.238]" stop success.

2021-03-01 12:07:29 begin to stop repmgrd on "[192.168.7.239]".

2021-03-01 12:07:29 repmgrd on "[192.168.7.239]" stop success.

2021-03-01 12:07:29 begin to stop DB on "[192.168.7.239]".

waiting for server to shut down.... done

server stopped

2021-03-01 12:07:30 DB on "[192.168.7.239]" stop success.

2021-03-01 12:07:30 begin to stop DB on "[192.168.7.238]".

waiting for server to shut down.... done

server stopped

2021-03-01 12:07:30 DB on "[192.168.7.238]" stop success.

2021-03-01 12:07:30 Done.

2021-03-01 12:07:30 Ready to start all DB ...

2021-03-01 12:07:30 begin to start DB on "[192.168.7.238]".

waiting for server to start.... done

server started

2021-03-01 12:07:31 execute to start DB on "[192.168.7.238]" success, connect to check it.

2021-03-01 12:07:32 DB on "[192.168.7.238]" start success.

2021-03-01 12:07:32 Try to ping trusted_servers on host 192.168.7.238 ...

2021-03-01 12:07:34 Try to ping trusted_servers on host 192.168.7.239 ...

2021-03-01 12:07:37 begin to start DB on "[192.168.7.239]".

waiting for server to start.... done

server started

2021-03-01 12:07:37 execute to start DB on "[192.168.7.239]" success, connect to check it.

2021-03-01 12:07:38 DB on "[192.168.7.239]" start success.

 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+---------+---------+-----------+----------+----------+----------+----------+

 1  | node238 | primary | * running |          | default  | 100      | 1        | host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

 2  | node239 | standby |   running | node238  | default  | 100      | 1        | host=192.168.7.239 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

2021-03-01 12:07:38 The primary DB is started.

2021-03-01 12:07:38 begin to start repmgrd on "[192.168.7.238]".

[2021-03-01 12:07:39] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6HA/kha/kingbase/bin/../etc/repmgr.conf"

[2021-03-01 12:07:39] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/kha/kingbase/hamgr.log"



2021-03-01 12:07:39 repmgrd on "[192.168.7.238]" start success.

2021-03-01 12:07:39 begin to start repmgrd on "[192.168.7.239]".

[2021-03-01 12:07:35] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6HA/kha/kingbase/bin/../etc/repmgr.conf"

[2021-03-01 12:07:35] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/kha/kingbase/hamgr.log"



2021-03-01 12:07:40 repmgrd on "[192.168.7.239]" start success.

 ID | Name    | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen

----+---------+---------+-----------+----------+---------+-------+---------+--------------------

 1  | node238 | primary | * running |          | running | 13285 | no      | n/a

 2  | node239 | standby |   running | node238  | running | 5508  | no      | 0 second(s) ago

2021-03-01 12:07:44 Done.

===从以上信息获知，在集群启动过程中，没有对VIP检测的环节。===

三、修改repmgr.conf配置文件配置vip（需要在所有节点执行）

1、确定配置vip的网卡

2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000

    link/ether 08:00:27:56:02:82 brd ff:ff:ff:ff:ff:ff

    inet 192.168.7.238/24 brd 192.168.7.255 scope global noprefixroute enp0s3

       valid_lft forever preferred_lft forever

====配置vip的网卡必须和物理ip是同一个设备。====

2、确定ip和arping可执行文件路径和权限

确定ip和arping可执行文件路径：

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ which arping

/usr/bin/arping

root@uos01:~# which ip

/usr/sbin/ip

查看arping版本：

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ arping -V

arping utility, iputils-s20180629

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ls arping

arping

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ which arping

/usr/bin/arping

===操作系统的arping版本是没问题。正确的版本号，显示：ipuitils-xxxx 都可以

配置ip和arping可执行文件权限（配置setuid权限）：

root@uos01:~# ls -lh /usr/bin/arping

-rwxr-xr-x 1 root root 27K Jan 14  2020 /usr/bin/arping

root@uos01:~# ls -lh /usr/bin/ip

-rwxr-xr-x 1 root root 575K Jun  4  2021 /usr/bin/ip

root@uos01:~# chmod 4755 /usr/bin/arping

root@uos01:~# chmod 4755 /usr/sbin/ip

root@uos01:~# ls -lh /usr/bin/arping

-rwsr-xr-x 1 root root 27K Jan 14  2020 /usr/bin/arping

root@uos01:~# ls -lh /usr/sbin/ip

lrwxrwxrwx 1 root root 7 Jun  4  2021 /usr/sbin/ip -> /bin/ip

root@uos01:~# ls -lh /bin/ip

-rwsr-xr-x 1 root root 575K Jun  4  2021 /bin/ip

注意：

1）ip命令用于加载和卸载vip。

2）arping命令用于vip切换中的arp cache的清理和测试。

3、修改repmgr.conf配置文件

3、修改repmgr.conf文件

四、重新启动集群（sys_monitor.sh启动）

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./sys_monitor.sh restart

2021-03-01 12:22:39 Ready to stop all DB ...

There is no service "node_export" running currently.

There is no service "postgres_ex" running currently.

There is no service "node_export" running currently.

There is no service "postgres_ex" running currently.

2021-03-01 12:22:42 begin to stop repmgrd on "[192.168.7.238]".

2021-03-01 12:22:43 repmgrd on "[192.168.7.238]" already stopped.

2021-03-01 12:22:43 begin to stop repmgrd on "[192.168.7.239]".

2021-03-01 12:22:43 repmgrd on "[192.168.7.239]" already stopped.

2021-03-01 12:22:43 begin to stop DB on "[192.168.7.239]".

waiting for server to shut down.... done

server stopped

2021-03-01 12:22:44 DB on "[192.168.7.239]" stop success.

2021-03-01 12:22:44 begin to stop DB on "[192.168.7.238]".

waiting for server to shut down.... done

server stopped

2021-03-01 12:22:44 DB on "[192.168.7.238]" stop success.

2021-03-01 12:22:44 Done.

2021-03-01 12:22:44 Ready to start all DB ...

2021-03-01 12:22:44 begin to start DB on "[192.168.7.238]".

waiting for server to start.... done

server started

2021-03-01 12:22:45 execute to start DB on "[192.168.7.238]" success, connect to check it.

2021-03-01 12:22:46 DB on "[192.168.7.238]" start success.

2021-03-01 12:22:46 Try to ping trusted_servers on host 192.168.7.238 ...

2021-03-01 12:22:48 Try to ping trusted_servers on host 192.168.7.239 ...

2021-03-01 12:22:51 begin to start DB on "[192.168.7.239]".

waiting for server to start.... done

server started

2021-03-01 12:22:51 execute to start DB on "[192.168.7.239]" success, connect to check it.

2021-03-01 12:22:52 DB on "[192.168.7.239]" start success.

 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+---------+---------+-----------+----------+----------+----------+----------+

 1  | node238 | primary | * running |          | default  | 100      | 1        | host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

 2  | node239 | standby |   running | node238  | default  | 100      | 1        | host=192.168.7.239 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

2021-03-01 12:22:53 The primary DB is started.

2021-03-01 12:22:57 Success to load virtual ip [192.168.7.244/24] on primary host [192.168.7.238].

2021-03-01 12:22:57 Try to ping vip on host 192.168.7.238 ...

2021-03-01 12:22:59 Try to ping vip on host 192.168.7.239 ...

2021-03-01 12:23:02 begin to start repmgrd on "[192.168.7.238]".

[2021-03-01 12:23:02] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6HA/kha/kingbase/bin/../etc/repmgr.conf"

[2021-03-01 12:23:02] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/kha/kingbase/hamgr.log"



2021-03-01 12:23:02 repmgrd on "[192.168.7.238]" start success.

2021-03-01 12:23:02 begin to start repmgrd on "[192.168.7.239]".

[2021-03-01 12:22:58] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6HA/kha/kingbase/bin/../etc/repmgr.conf"

[2021-03-01 12:22:58] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/kha/kingbase/hamgr.log"



2021-03-01 12:23:03 repmgrd on "[192.168.7.239]" start success.

 ID | Name    | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen

----+---------+---------+-----------+----------+---------+-------+---------+--------------------

 1  | node238 | primary | * running |          | running | 15043 | no      | n/a

 2  | node239 | standby |   running | node238  | running | 6440  | no      | n/a

2021-03-01 12:23:07 Done.

=== 从以上信息可获知，集群重启后已经开始加载VIP地址 [192.168.7.244/24] ===

五、验证集群状态

1、查看vip的加载

2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000

    link/ether 08:00:27:56:02:82 brd ff:ff:ff:ff:ff:ff

    inet 192.168.7.238/24 brd 192.168.7.255 scope global noprefixroute enp0s3

       valid_lft forever preferred_lft forever

    inet 192.168.7.244/24 scope global secondary enp0s3:3

       valid_lft forever preferred_lft forever

=== 从以上获知，vip加载在主库节点成功===

2、查看集群节点状态

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./repmgr cluster show

 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+---------+---------+-----------+----------+----------+----------+----------+

 1  | node238 | primary | * running |          | default  | 100      | 1        | host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

 2  | node239 | standby |   running | node238  | default  | 100      | 1        | host=192.168.7.239 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

3、通过vip连接数据库查看流复制状态

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./ksql -h 192.168.7.244 -U system test

ksql (V8.0)

Type "help" for help.



test=# select * from sys_stat_replication;

  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_s

tart         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag |

 replay_lag | sync_priority | sync_state |          reply_time

-------+----------+---------+------------------+---------------+-----------------+

 14935 |    16384 | esrep   | node239          | 192.168.7.239 |                 |       58172 | 2021-03-01 12:22:

51.831920+08 |              | streaming | 0/6000670 | 0/6000670 | 0/6000670 | 0/6000670  |           |           |

            |             1 | quorum     | 2021-03-01 12:24:30.751707+08

(1 row)

六、主备switchover切换测试

1、切换前集群节点状态

kingbase@uos02:~/cluster/R6HA/kha/kingbase/bin$ ./repmgr cluster show

 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+---------+---------+-----------+----------+----------+----------+----------+

 1  | node238 | primary | * running |          | default  | 100      | 1        | host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

 2  | node239 | standby |   running | node238  | default  | 100      | 1        | host=192.168.7.239 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

2、执行switchover的切换

kingbase@uos02:~/cluster/R6HA/kha/kingbase/bin$ ./repmgr standby switchover --siblings-follow

NOTICE: executing switchover on node "node239" (ID: 2)

WARNING: option "--sibling-nodes" specified, but no sibling nodes exist

INFO: pausing repmgrd on node "node238" (ID 1)

INFO: pausing repmgrd on node "node239" (ID 2)

NOTICE: local node "node239" (ID: 2) will be promoted to primary; current primary "node238" (ID: 1) will be demoted to standby

NOTICE: stopping current primary node "node238" (ID: 1)

NOTICE: issuing CHECKPOINT

NOTICE: node (ID: 1) release the virtual ip 192.168.7.244/24 success

DETAIL: executing server command "/home/kingbase/cluster/R6HA/kha/kingbase/bin/sys_ctl  -D '/home/kingbase/cluster/R6HA/kha/kingbase/data' -l /home/kingbase/cluster/R6HA/kha/kingbase/bin/logfile -W -m fast stop"

INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")

NOTICE: current primary has been cleanly shut down at location 0/7000028

NOTICE: PING 192.168.7.244 (192.168.7.244) 56(84) bytes of data.



--- 192.168.7.244 ping statistics ---

2 packets transmitted, 0 received, 100% packet loss, time 3ms





WARNING: ping host"192.168.7.244" failed

DETAIL: average RTT value is not greater than zero

NOTICE: new primary node (ID: 2) acquire the virtual ip 192.168.7.244/24 success

NOTICE: promoting standby to primary

DETAIL: promoting server "node239" (ID: 2) using sys_promote()

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "node239" (ID: 2) was successfully promoted to primary

NOTICE: issuing CHECKPOINT

INFO: local node 1 can attach to rejoin target node 2

DETAIL: local node's recovery point: 0/7000028; rejoin target node's fork point: 0/70000A0

NOTICE: setting node 1's upstream to node 2

WARNING: unable to ping "host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3"

DETAIL: PQping() returned "PQPING_NO_RESPONSE"

NOTICE: begin to start server at 2021-03-01 12:29:42.971664

NOTICE: starting server using "/home/kingbase/cluster/R6HA/kha/kingbase/bin/sys_ctl  -w -t 90 -D '/home/kingbase/cluster/R6HA/kha/kingbase/data' -l /home/kingbase/cluster/R6HA/kha/kingbase/bin/logfile start"

NOTICE: start server finish at 2021-03-01 12:29:43.087104

NOTICE: replication slot "repmgr_slot_2" deleted on node 1

NOTICE: NODE REJOIN successful

DETAIL: node 1 is now attached to node 2

NOTICE: switchover was successful

DETAIL: node "node239" is now primary and node "node238" is attached as standby

INFO: unpausing repmgrd on node "node238" (ID 1)

INFO: unpause node "node238" (ID 1) successfully

INFO: unpausing repmgrd on node "node239" (ID 2)

INFO: unpause node "node239" (ID 2) successfully

NOTICE: STANDBY SWITCHOVER has completed successfully

3、查看切换后vip的加载

kingbase@uos02:~/cluster/R6HA/kha/kingbase/bin$ ip add sh

2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000

    link/ether 08:00:27:c9:c0:27 brd ff:ff:ff:ff:ff:ff

    inet 192.168.7.239/24 brd 192.168.7.255 scope global noprefixroute enp0s3

       valid_lft forever preferred_lft forever

    inet 192.168.7.244/24 scope global secondary enp0s3:3

       valid_lft forever preferred_lft forever

=== 由以上获知，vip已经加载到新的主库上===

4、查看切换后的节点状态（切换状态正常）

kingbase@uos02:~/cluster/R6HA/kha/kingbase/bin$ ./repmgr cluster show

 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+---------+---------+-----------+----------+----------+----------+----------+

 1  | node238 | standby |   running | node239  | default  | 100      | 1        | host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

 2  | node239 | primary | * running |          | default  | 100      | 2        | host=192.168.7.239 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

5、查看原主库vip（已经被卸载）

2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000

    link/ether 08:00:27:56:02:82 brd ff:ff:ff:ff:ff:ff

    inet 192.168.7.238/24 brd 192.168.7.255 scope global noprefixroute enp0s3

       valid_lft forever preferred_lft forever

七、集群failover switch测试

1、关闭主库数据库服务

kingbase@uos02:~/cluster/R6HA/kha/kingbase/bin$ ./sys_ctl stop -D ../data

waiting for server to shut down.... done

server stopped

2、查看failover后集群节点状态

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./repmgr cluster show

 ID | Name    | Role    | Status               | Upstream  | Location | Priority | Timeline | Connection string

----+---------+---------+----------------------+-----------+----------+----------+

 1  | node238 | standby | ! running as primary | ? node239 | default  | 100      | 3        | host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

 2  | node239 | primary | ? unreachable        |           | default  | 100      | ?        | host=192.168.7.239 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

WARNING: following issues were detected

  - node "node238" (ID: 1) is registered as standby but running as primary

  - unable to connect to node "node238" (ID: 1)'s upstream node "node239" (ID: 2)

  - unable to determine if node "node238" (ID: 1) is attached to its upstream node "node239" (ID: 2)

  - unable to connect to node "node239" (ID: 2)

  - node "node239" (ID: 2) is registered as an active primary but is unreachable

=== 从以上获知，在主库数据库服务宕机后，发生failover的切换，原备库被切换为新的主库，在节点状态中原主库的状态为”unreachable“。===

八、配置过程中的故障信息

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./sys_monitor.sh restart

the dir "/sbin" has no execute file "arping", please set [arping_path] in /home/kingbase/cluster/R6HA/kha/kingbase/bin/../etc/repmgr.conf

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./sys_monitor.sh restart

2021-03-01 12:19:27 Ready to stop all DB ...

Service process "node_export" was killed at process 13382

Service process "postgres_ex" was killed at process 13383

Service process "node_export" was killed at process 5575

Service process "postgres_ex" was killed at process 5576

2021-03-01 12:19:31 begin to stop repmgrd on "[192.168.7.238]".

2021-03-01 12:19:31 repmgrd on "[192.168.7.238]" stop success.

2021-03-01 12:19:31 begin to stop repmgrd on "[192.168.7.239]".

2021-03-01 12:19:32 repmgrd on "[192.168.7.239]" stop success.

2021-03-01 12:19:32 begin to stop DB on "[192.168.7.239]".

incorrect command permissions for the virtual ip.

waiting for server to shut down.... done

server stopped

2021-03-01 12:19:33 DB on "[192.168.7.239]" stop success.

2021-03-01 12:19:33 begin to stop DB on "[192.168.7.238]".

incorrect command permissions for the virtual ip.

waiting for server to shut down.... done

server stopped

2021-03-01 12:19:33 DB on "[192.168.7.238]" stop success.

2021-03-01 12:19:33 Done.

2021-03-01 12:19:33 Ready to start all DB ...

2021-03-01 12:19:33 begin to start DB on "[192.168.7.238]".

incorrect command permissions for the virtual ip.

waiting for server to start.... done

server started

2021-03-01 12:19:34 execute to start DB on "[192.168.7.238]" success, connect to check it.

2021-03-01 12:19:35 DB on "[192.168.7.238]" start success.

2021-03-01 12:19:35 Try to ping trusted_servers on host 192.168.7.238 ...

2021-03-01 12:19:37 Try to ping trusted_servers on host 192.168.7.239 ...

2021-03-01 12:19:40 begin to start DB on "[192.168.7.239]".

incorrect command permissions for the virtual ip.

waiting for server to start.... done

server started

2021-03-01 12:19:40 execute to start DB on "[192.168.7.239]" success, connect to check it.

2021-03-01 12:19:41 DB on "[192.168.7.239]" start success.

ERROR: No execute permission for "/usr/sbin/ip"

incorrect command permissions for the virtual ip.

2021-03-01 12:19:42 There is no primary DB running, will do nothing and exit.

　=== 从以上故障获知，在配置文件没有设置arping可执行文件的路径及ip和arping可执行文件没有设置setuid权限===

九、操作步骤总结：

1） 确定需要配置的vip地址，需和物理ip同网段，并且没有被使用。      
2） 查看arping和ip可执行文件的路径及arping的版本。      
3） 对ip和arping可执行文件配置setuid权限（s权限）。      
4） 修改repmgr.conf文件添加配置项。      
5） 重新启动集群并验证集群状态。      
6） 主备切换测试。      
7） 应用连接vip访问测试。