案例说明:

KingbaseES R6集群启动时,出现“incorrect command permissions for the virtual ip”故障,本案例介绍了如何分析和解决此案例方法和步骤。

数据库版本:

test=# select version();
version
----------------------------------------------------------------------------------------------------------------------
KingbaseES V008R006C005B0023 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)

集群架构:

一、集群启动失败

[kingbase@node3 bin]$ ./sys_monitor.sh start
2021-03-01 13:27:26 Ready to start all DB ...
2021-03-01 13:27:26 begin to start DB on "[192.168.7.243]".
incorrect command permissions for the virtual ip.
waiting for server to start..... done
server started
2021-03-01 13:27:30 execute to start DB on "[192.168.7.243]" success, connect to check it.
2021-03-01 13:27:31 DB on "[192.168.7.243]" start success.
2021-03-01 13:27:32 Try to ping trusted_servers on host 192.168.7.248 ...
2021-03-01 13:27:34 Try to ping trusted_servers on host 192.168.7.243 ...
2021-03-01 13:27:37 begin to start DB on "[192.168.7.248]".
incorrect command permissions for the virtual ip.
waiting for server to start..... done
server started
2021-03-01 13:27:40 execute to start DB on "[192.168.7.248]" success, connect to check it.
2021-03-01 13:27:41 DB on "[192.168.7.248]" start success.
ERROR: No execute permission for "/home/kingbase/cluster/R6C5/R6C5R//kingbase/bin/arping"
incorrect command permissions for the virtual ip.
2021-03-01 13:27:41 There is no primary DB running, will do nothing and exit.

=从以上错误信息可知,在加载vip时访问arping时,出现权限问题=

二、故障分析

1、查看repmgr配置信息

[kingbase@node3 bin]$ cat ../etc/repmgr.conf
on_bmj=off
node_id=1
node_name='node243'
promote_command='/home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/repmgr standby promote -f /home/kingbase/cluster/R6C5/R6C5R/kingbase/etc/repmgr.conf'
follow_command='/home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/repmgr standby follow -f /home/kingbase/cluster/R6C5/R6C5R/kingbase/etc/repmgr.conf -W --upstream-node-id=%n'
conninfo='host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3'
log_file='/home/kingbase/cluster/R6C5/R6C5R/kingbase/log/hamgr.log'
kbha_log_file='/home/kingbase/cluster/R6C5/R6C5R/kingbase/log/kbha.log'
data_directory='/home/kingbase/cluster/R6C5/R6C5R/kingbase/data'
sys_bindir='/home/kingbase/cluster/R6C5/R6C5R/kingbase/bin'
ssh_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22'
reconnect_attempts=10
reconnect_interval=6
failover='automatic'
recovery='standby'
monitoring_history='no'
trusted_servers='192.168.7.1'
virtual_ip='192.168.7.241/24'
net_device='enp0s3'
net_device_ip='192.168.7.243'
ipaddr_path='/sbin'
arping_path='/home/kingbase/cluster/R6C5/R6C5R//kingbase/bin'
synchronous='sync'
repmgrd_pid_file='/home/kingbase/cluster/R6C5/R6C5R/kingbase/etc/hamgrd.pid'
kbha_pid_file='/home/kingbase/cluster/R6C5/R6C5R/kingbase/etc/kbha.pid'
ping_path='/usr/bin'
auto_cluster_recovery_level=1
use_check_disk=off

=此版本使用的arping是数据库软件包自带的工具=

2、查看arping版本

3、查看arping权限

[kingbase@node1 bin]$ ls -lh arping
-rwxr-xr-x 1 kingbase root 11K Nov 5 2021 arping

三、问题解决步骤

1、配置arping所有者为kingbase用户

1)配置权限

[kingbase@node1 bin]$ chown -R kingbase.kingbase arping
[kingbase@node1 bin]$ ls -lh arping
-rwxr-xr-x 1 kingbase kingbase 11K Nov 5 2021 arping

2)启动集群(故障依旧)

2、配置arping所有者为root并分配setuid权限

1)配置权限

[root@node3 ~]# cd /home/kingbase/cluster/R6C5/R6C5R//kingbase/bin
[root@node3 bin]# chown -R root.root arping
[root@node3 bin]# chmod u+s arping
[root@node3 bin]# ls -lh arping
-rwsr-xr-x 1 root root 11K Nov 5 2021 arping

2)启动集群

[kingbase@node3 bin]$ ./sys_monitor.sh start
2021-03-01 13:38:04 Ready to start all DB ...
2021-03-01 13:38:04 begin to start DB on "[192.168.7.243]".
2021-03-01 13:38:05 DB on "[192.168.7.243]" already started, connect to check it.
2021-03-01 13:38:06 DB on "[192.168.7.243]" start success.
2021-03-01 13:38:06 Try to ping trusted_servers on host 192.168.7.248 ...
2021-03-01 13:38:08 Try to ping trusted_servers on host 192.168.7.243 ...
2021-03-01 13:38:11 begin to start DB on "[192.168.7.248]".
2021-03-01 13:38:12 DB on "[192.168.7.248]" already started, connect to check it.
2021-03-01 13:38:13 DB on "[192.168.7.248]" start success.
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
1 | node243 | primary | * running | | default | 100 | 3 | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node248 | standby | running | node243 | default | 100 | 3 | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2021-03-01 13:38:13 The primary DB is started.
2021-03-01 13:38:13 check synchronous_standby_names ...
t
2021-03-01 13:38:24 Success to load virtual ip [192.168.7.241/24] on primary host [192.168.7.243].
2021-03-01 13:38:24 Try to ping vip on host 192.168.7.248 ...
2021-03-01 13:38:26 Try to ping vip on host 192.168.7.243 ...
2021-03-01 13:38:29 begin to start repmgrd on "[192.168.7.248]".
[2021-03-01 13:40:52] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/../etc/repmgr.conf"
[2021-03-01 13:40:52] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6C5/R6C5R/kingbase/log/hamgr.log" 2021-03-01 13:38:30 execute to start repmgrd on "[192.168.7.248]" failed.
2021-03-01 13:38:30 begin to start repmgrd on "[192.168.7.243]".
[2021-03-01 13:38:30] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/../etc/repmgr.conf"
[2021-03-01 13:38:30] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6C5/R6C5R/kingbase/log/hamgr.log" 2021-03-01 13:38:32 repmgrd on "[192.168.7.243]" start success.
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+---------+---------+-----------+----------+-------------+-------+---------+--------------------
1 | node243 | primary | * running | | running | 12552 | no | n/a
2 | node248 | standby | running | node243 | not running | n/a | n/a | n/a
[2021-03-01 13:40:56] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6C5/R6C5R/kingbase/log/kbha.log" [2021-03-01 13:38:37] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6C5/R6C5R/kingbase/log/kbha.log" 2021-03-01 13:38:39 Done. [kingbase@node3 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------------------------------------------------------------------
1 | node243 | primary | * running | | default | 100 | 3 | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node248 | standby | running | node243 | default | 100 | 3 | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

=== 由以上可知,集群启动成功。===

四、总结

对于kingbaseES R6集群使用数据库系统自带的arping软件包,一般不会出现版本不匹配的问题;对于arping工具的属主应该是root,不是kingbase用户,但为了kingbase用户也能执行arping,必须配置arping的setuid权限。

KingbaseES R6 集群启动‘incorrect command permissions for the virtual ip’故障案例的更多相关文章

  1. KingbaseES R6 集群手工配置VIP案例

    经常有用户问,V8R6集群搭建时没有配置VIP,搭建完成后,如何添加VIP?以下向大家介绍下手动添加VIP 的过程. 一.操作系统环境 操作系统(UOS): root@uos01:~# cat /et ...

  2. KingbaseES R6 集群创建流复制只读副本库案例

    一.环境概述 [kingbase@node2 bin]$ ./ksql -U system test ksql (V8.0) Type "help" for help. test= ...

  3. KingbaseES R6 集群修改物理IP和VIP案例

    在用户的实际环境里,可能有时需要修改主机的IP,这就涉及到集群的配置修改.以下以例子的方式,介绍下KingbaseES R6集群如何修改IP. 一.案例测试环境 操作系统: [KINGBASE@nod ...

  4. KingbaseES R6 集群 recovery 参数对切换的影响

    案例说明:在KingbaseES R6集群中,主库节点出现宕机(如重启或关机),会产生主备切换,但是当主库节点系统恢复正常后,如何对原主库节点进行处理,保证集群数据的一致性和安全,可以通过对repmg ...

  5. KingbaseES R6 集群修改data目录

    案例说明: 本案例是在部署完成KingbaseES R6集群后,由于业务的需求,集群需要修改data(数据存储)目录的测试.本案例分两种修改方式,第一种是离线修改data目录,即关闭整个集群后,修改数 ...

  6. KingbaseES R6 集群通过备库clone在线添加新节点

    案例说明: KingbaseES R6集群可以通过图形化方式在线添加新节点,但是在添加新节点clone环节时,是从主库copy数据到新的节点,这样在生产环境,如果数据量大,将会对主库的网络I/O造成压 ...

  7. KingbaseES R6 集群repmgr.conf参数'recovery'测试案例(一)

    KingbaseES R6集群repmgr.conf参数'recovery'测试案例(一) 案例说明: 在KingbaseES R6集群中,主库节点出现宕机(如重启或关机),会产生主备切换,但是当主库 ...

  8. KingbaseES R6 集群sys_monitor.sh change_password一键修改集群用户密码

    案例说明: kingbaseES R6集群用户密码修改,需要修改两处: 1)修改数据库用户密码(alter user): 2)修改.encpwd文件中用户密码: 可以通过sys_monitor.sh ...

  9. KingbaseES R6 集群禁用 root ssh 后需要修改集群为es_server 案例

    案例说明: 在生产环境下,由于安全需要,主机间不允许建立root用户的ssh信任连接,这样导致KingbaseES R6 repmgr集群,通过sys_monitor.sh脚本启动集群时,节点之间不能 ...

随机推荐

  1. gslb(global server load balance)技术的一点理解

    gslb(global server load balance)技术的一点理解 前言 对于比较大的互联网公司来说,用户可能遍及海内外,此时,为了提升用户体验,公司一般会在离用户较近的地方建立机房,来服 ...

  2. 广东省30m二级分类土地利用数据(矢量)

    数据下载链接:百度云下载链接​ 广东省,地处中国大陆最南部,属于东亚季风区,从北向南分别为中亚热带.南亚热带和热带气候,是中国光.热和水资源最丰富的地区之一.主要河系为珠江的西江.东江.北江和三角洲水 ...

  3. Windows JDK 的下载与安装

    Java Development Kit 简称 JDK,任何需要开发 Java 程序的环境都需要进行安装 JDK. JDK 下载地址:https://www.oracle.com/java/techn ...

  4. @Async注解的坑,小心

    大家好,我是三友. 背景 前段时间,一个同事小姐姐跟我说她的项目起不来了,让我帮忙看一下,本着助人为乐的精神,这个忙肯定要去帮. 于是,我在她的控制台发现了如下的异常信息: Exception in ...

  5. VMare 设置固定IP和网段

    切换目录 cd /etc/sysconfig/network-scripts ls查看当前目录下的东西 找到ipcfg- 开头的,而且不是iocfg-lo,而上图就是那个ifcfg-ens33. 则进 ...

  6. Unity-2D像素晶格化消融

    效果展示: ShaderLab Shader功能:图像变白+根据顶点的y值作透明裁剪: 才是可操作属性: IsDead: 控制像素变白,片元着色阶段IsDead小于0将颜色改为白色: Percent: ...

  7. Solution -「BZOJ3894」文理分科

    Sol. 说实话,对于一个初学者,这道题很难看出是一道网络流-最小割.对于一个熟练者,这是比较套路的一种模型. 最小割,可以看做是在一个图中删掉最小的边权和使得源点.汇点不连通.或者换一个角度,可以看 ...

  8. 解决报错Error response from daemon: Get https://10.0.0.110/v2/: dial tcp 10.0.0.110:443: connect: connection refused

    修改 #https不需要验证,否则要加上以下配置# 意思就是非安全仓库,加上重启就OK了! vim /lib/systemd/system/docker.service --insecure-regi ...

  9. 【洛谷P1754 球迷购票问题】题解

    传送门 卡特兰数经典 \(\texttt{AB}\) 分拆问题. 分析: 题意相当于排列 \(n\) 个 \(\texttt A\) 和 \(n\) 个 \(\texttt B\),使得相邻 \(\t ...

  10. css基础03

    就近原则执行粉色.而不是全覆盖,只有样式冲突的地方才会覆盖. 会执行粉色和12px.后来者居上 高度宽度内外边距这些不会继承. 子元素会继承行高, 1.5是行高是字体大小的1.5倍的意思. 有了!im ...