KingbaseES运维案例之---服务进程(backend process)终止
案例说明:
如下图所示:KingbaseES服务进程结构

KingbaseES使用客户端/服务器的模型。 对于每个客户端的连接,KingbaseES主进程接收到客户端连接后,会为其创建一个新的服务进程。 KingbaseES 用服务进程来处理连接到数据库服务的客户端请求。 该进程负责实际处理客户端的数据库请求,连接断开时退出。当Client连接到数据库时,会有对应的kingbase的服务进程为其提供服务,如以下Client查询访问:

如下所示,操作系统对应的服务进程(backend process):

当Client结束访问正常退出数据库连接时,对应的kingbase的服务进程也将结束;但是当客户端异常退出时,会导致数据库端的kingbase服务进程没有正常结束,并占用数据库资源,本案例将详细描述手工方式对服务进程(backend process)终止。手工结束backend process可以使用数据库工具或者操作系统的kill进程方式,但是不同方式对数据库造成的影响不同。
适用版本:
KingbaseES V8R3/R6
系统架构:

一、客户端访问
[kingbase@node1 bin]$ ./ksql -h 192.168.8.201 -U system -W prod
Password:
ksql (V8.0)
Type "help" for help.
prod=# select count(*) from t1;
count
--------
100000
(1 row)
二、backend process终止方案
1、pg_terminate_backend(pid)方式
Tips:函数 pg_terminate_backend() 实际上是给进程发送了一个 SIGTERM 信号。
# 查询backend process状态信息
prod=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | quer
y_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+------------+--------------
-----------------+-------------------------------+-----------------+------------+-------+-------------+--------------+--------------------------+----------------
16385 | prod | 17100 | 10 | system | kingbase_*&+_ | 192.168.8.200 | | 57476 | 2022-11-29 15:04:57.131987+08 | | 2022-11-29 15
:05:05.526379+08 | 2022-11-29 15:05:05.539018+08 | Client | ClientRead | idle | | | select count(*) from t1; | client backend
(1 row)
#终止backend process对应的pid
prod=# select pg_terminate_backend(17100);
pg_terminate_backend
----------------------
t
(1 row)
#进程被终止
prod=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait
_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+------------+-------------+--------------+-----
------------+------------+-------+-------------+--------------+-------+--------------
(0 rows)
#如下所示,数据库进程正常,对应的backend pross被安全终止
[root@node2 sys_log]# ps -ef |grep kingbase
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 13090 13089 0 14:39 ? 00:00:00 kingbase: logger
kingbase 16474 13089 0 14:59 ? 00:00:00 kingbase: checkpointer
kingbase 16475 13089 0 14:59 ? 00:00:00 kingbase: background writer
kingbase 16476 13089 0 14:59 ? 00:00:00 kingbase: walwriter
kingbase 16477 13089 0 14:59 ? 00:00:00 kingbase: autovacuum launcher
kingbase 16478 13089 0 14:59 ? 00:00:00 kingbase: stats collector
kingbase 16479 13089 0 14:59 ? 00:00:00 kingbase: ksh writer
kingbase 16480 13089 0 14:59 ? 00:00:00 kingbase: ksh collector
kingbase 16481 13089 0 14:59 ? 00:00:00 kingbase: kwr collector
kingbase 16482 13089 0 14:59 ? 00:00:00 kingbase: logical replication launcher
kingbase 17100 13089 0 15:04 ? 00:00:00 kingbase: system prod 192.168.8.200(57476) idle
kingbase 17212 12416 0 15:05 pts/0 00:00:00 ./ksql -U system test
kingbase 17214 13089 0 15:05 ? 00:00:00 kingbase: system prod [local] idle
2、操作系统kill pid方式
#查询backend process状态信息
prod=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | quer
y_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+------------+--------------
-----------------+-------------------------------+-----------------+------------+-------+-------------+--------------+--------------------------+----------------
16385 | prod | 18424 | 10 | system | kingbase_*&+_ | 192.168.8.200 | | 57480 | 2022-11-29 15:15:26.813035+08 | | 2022-11-29 15
:15:28.912910+08 | 2022-11-29 15:15:28.922719+08 | Client | ClientRead | idle | | | select count(*) from t1; | client backend
(1 row)
#操作系统下执行kill pid结束进程
[root@node2 sys_log]# kill 18424
#如下所示,数据库进程正常,对应的backend pross被安全kill
[root@node2 sys_log]# ps -ef |grep kingbase
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 13090 13089 0 14:39 ? 00:00:00 kingbase: logger
kingbase 16474 13089 0 14:59 ? 00:00:00 kingbase: checkpointer
kingbase 16475 13089 0 14:59 ? 00:00:00 kingbase: background writer
kingbase 16476 13089 0 14:59 ? 00:00:00 kingbase: walwriter
kingbase 16477 13089 0 14:59 ? 00:00:00 kingbase: autovacuum launcher
kingbase 16478 13089 0 14:59 ? 00:00:00 kingbase: stats collector
kingbase 16479 13089 0 14:59 ? 00:00:00 kingbase: ksh writer
kingbase 16480 13089 0 14:59 ? 00:00:00 kingbase: ksh collector
kingbase 16481 13089 0 14:59 ? 00:00:00 kingbase: kwr collector
kingbase 16482 13089 0 14:59 ? 00:00:00 kingbase: logical replication launcher
kingbase 17212 12416 0 15:05 pts/0 00:00:00 ./ksql -U system test
kingbase 17214 13089 0 15:05 ? 00:00:00 kingbase: system prod [local] idle
#在数据库视图中已经无此backend process记录
prod=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait
_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+------------+-------------+--------------+-----
------------+------------+-------+-------------+--------------+-------+--------------
(0 rows)
3、操作系统kill -15 pid和数据库sys_ctl kill TERM PID
1)操作系统kill -15 pid
#查询backend process状态信息
test=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | quer
y_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+------------+--------------
-----------------+-------------------------------+-----------------+------------+-------+-------------+--------------+--------------------------+----------------
16385 | prod | 22955 | 10 | system | kingbase_*&+_ | 192.168.8.200 | | 57498 | 2022-11-29 15:44:29.993305+08 | | 2022-11-29 15
:44:32.090913+08 | 2022-11-29 15:44:32.100617+08 | Client | ClientRead | idle | | | select count(*) from t1; | client backend
(1 row)
#操作系统下执行kill -15 pid结束进程
[kingbase@node2 bin]$ kill -15 22955
[kingbase@node2 bin]$ ps -ef |grep kingbase
#如下所示,数据库进程正常,对应的backend pross被安全kill
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 22197 13089 0 15:38 ? 00:00:00 kingbase: checkpointer
kingbase 22198 13089 0 15:38 ? 00:00:00 kingbase: background writer
kingbase 22199 13089 0 15:38 ? 00:00:00 kingbase: walwriter
kingbase 22200 13089 0 15:38 ? 00:00:00 kingbase: autovacuum launcher
kingbase 22201 13089 0 15:38 ? 00:00:00 kingbase: stats collector
kingbase 22202 13089 0 15:38 ? 00:00:00 kingbase: ksh writer
kingbase 22203 13089 0 15:38 ? 00:00:00 kingbase: ksh collector
kingbase 22204 13089 0 15:38 ? 00:00:00 kingbase: kwr collector
kingbase 22205 13089 0 15:38 ? 00:00:00 kingbase: logical replication launcher
kingbase 22444 12416 0 15:40 pts/0 00:00:00 ./ksql -U system test
kingbase 22445 13089 0 15:40 ? 00:00:00 kingbase: system test [local] idle
#在数据库视图中已经无此backend process记录
test=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait
_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+------------+-------------+--------------+-----
------------+------------+-------+-------------+--------------+-------+--------------
(0 rows)
2)数据库sys_ctl kill TERM pid
#查询backend process状态信息
test=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | quer
y_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+------------+--------------
-----------------+-------------------------------+-----------------+------------+-------+-------------+--------------+--------------------------+----------------
16385 | prod | 22443 | 10 | system | kingbase_*&+_ | 192.168.8.200 | | 57494 | 2022-11-29 15:40:42.804868+08 | | 2022-11-29 15
:40:44.972533+08 | 2022-11-29 15:40:44.985340+08 | Client | ClientRead | idle | | | select count(*) from t1; | client backend
(1 row)
#执行数据库命令kill进程
[kingbase@node2 bin]$ ./sys_ctl kill TERM 22443
#如下所示,数据库进程正常,对应的backend pross被安全kill
[kingbase@node2 bin]$ ps -ef |grep kingbase
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 22197 13089 0 15:38 ? 00:00:00 kingbase: checkpointer
kingbase 22198 13089 0 15:38 ? 00:00:00 kingbase: background writer
kingbase 22199 13089 0 15:38 ? 00:00:00 kingbase: walwriter
kingbase 22200 13089 0 15:38 ? 00:00:00 kingbase: autovacuum launcher
kingbase 22201 13089 0 15:38 ? 00:00:00 kingbase: stats collector
kingbase 22202 13089 0 15:38 ? 00:00:00 kingbase: ksh writer
kingbase 22203 13089 0 15:38 ? 00:00:00 kingbase: ksh collector
kingbase 22204 13089 0 15:38 ? 00:00:00 kingbase: kwr collector
kingbase 22205 13089 0 15:38 ? 00:00:00 kingbase: logical replication launcher
kingbase 22444 12416 0 15:40 pts/0 00:00:00 ./ksql -U system test
kingbase 22445 13089 0 15:40 ? 00:00:00 kingbase: system test [local] idle
#在数据库视图中已经无此backend process记录
test=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait
_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+------------+-------------+--------------+-----
------------+------------+-------+-------------+--------------+-------+--------------
(0 rows)
4、操作系统kill -3 pid和数据库sys_ctl kill QUIT PID
1)操作系统kill -3 pid
#查询backend process状态信息
prod=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | quer
y_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query
| backend_type
-------+---------+-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+------------+--------------
-----------------+-------------------------------+-----------------+------------+-------+-------------+--------------+-------------------------------------------------------
--------------+----------------
16385 | prod | 18666 | 10 | system | kingbase_*&+_ | 192.168.8.200 | | 57486 | 2022-11-29 15:17:34.726155+08 | | 2022-11-29 15
:17:34.736202+08 | 2022-11-29 15:17:34.740584+08 | Client | ClientRead | idle | | | select setting from pg_settings where name = 'enable_u
pper_colname' | client backend
(1 row)
#操作系统下执行kill -3 pid结束进程
[root@node2 sys_log]# kill -3 18666
#如下所示,除了对应的backend process被终止外,其余的后台辅助进程也被终止并重启
[root@node2 sys_log]# ps -ef |grep kingbase
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 13090 13089 0 14:39 ? 00:00:00 kingbase: logger
kingbase 17212 12416 0 15:05 pts/0 00:00:00 ./ksql -U system test
kingbase 18795 13089 0 15:18 ? 00:00:00 kingbase: startup
#查看对应的sys_log日志(backend process被终止导致数据库的辅助进程也将重启)
prod=# select * from sys_stat_activity where client_addr='192.168.8.200';
WARNING: terminating connection because of crash of another server process
DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
2022-11-29 15:18:06.741 CST [18666] WARNING: terminating connection because of crash of another server process
2022-11-29 15:18:06.741 CST [18666] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 15:18:06.741 CST [18666] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 15:18:06.742 CST [13089] LOG: server process (PID 18666) exited with exit code 2
2022-11-29 15:18:06.742 CST [13089] DETAIL: Failed process was running: select setting from pg_settings where name = 'enable_upper_colname'
2022-11-29 15:18:06.742 CST [13089] LOG: terminating any other active server processes
2022-11-29 15:18:06.743 CST [17214] WARNING: terminating connection because of crash of another server process
2022-11-29 15:18:06.743 CST [17214] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 15:18:06.743 CST [17214] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 15:18:06.744 CST [16477] WARNING: terminating connection because of crash of another server process
2022-11-29 15:18:06.744 CST [16477] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 15:18:06.744 CST [16477] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 15:18:06.745 CST [13089] LOG: all server processes terminated; reinitializing
2022-11-29 15:18:06.823 CST [18795] LOG: database system was interrupted; last known up at 2022-11-29 14:59:17 CST
2022-11-29 15:19:29.897 CST [18795] LOG: database system was not properly shut down; automatic recovery in progress
2022-11-29 15:19:30.063 CST [18795] LOG: redo starts at 0/22935D8
2022-11-29 15:19:30.063 CST [18795] LOG: redo wal segment count 2
2022-11-29 15:19:30.063 CST [18795] LOG: invalid record length at 0/2293608: wanted 24, got 0
2022-11-29 15:19:30.063 CST [18795] LOG: redo done at 0/22935D8
2022-11-29 15:19:30.739 CST [13089] LOG: database system is ready to accept connections
#数据库服务重启后
[root@node2 sys_log]# ps -ef |grep kingbase
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 13090 13089 0 14:39 ? 00:00:00 kingbase: logger
kingbase 17212 12416 0 15:05 pts/0 00:00:00 ./ksql -U system test
kingbase 18953 13089 0 15:19 ? 00:00:00 kingbase: checkpointer
kingbase 18954 13089 0 15:19 ? 00:00:00 kingbase: background writer
kingbase 18955 13089 0 15:19 ? 00:00:00 kingbase: walwriter
kingbase 18957 13089 0 15:19 ? 00:00:00 kingbase: stats collector
kingbase 18958 13089 0 15:19 ? 00:00:00 kingbase: ksh writer
kingbase 18959 13089 0 15:19 ? 00:00:00 kingbase: ksh collector
kingbase 18960 13089 0 15:19 ? 00:00:00 kingbase: kwr collector
---如上所示,kill -3 pid用于终止backend process将给数据库带来极大的风险 。
2)数据库sys_ctl kill QUIT PID
#查询backend process状态信息
test=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | quer
y_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+------------+--------------
-----------------+-------------------------------+-----------------+------------+-------+-------------+--------------+--------------------------+----------------
16385 | prod | 21894 | 10 | system | kingbase_*&+_ | 192.168.8.200 | | 57490 | 2022-11-29 15:36:10.034020+08 | | 2022-11-29 15
:36:13.902728+08 | 2022-11-29 15:36:13.917841+08 | Client | ClientRead | idle | | | select count(*) from t1; | client backend
(1 row)
#执行数据库命令sys_ctl kill QUIT终止backend process
[kingbase@node2 bin]$ ./sys_ctl kill QUIT 21894
#如下所示,除了对应的backend process被终止外,其余的后台辅助进程也被终止并重启
[kingbase@node2 bin]$ ps -ef |grep kingbase
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 13090 13089 0 14:39 ? 00:00:00 kingbase: logger
kingbase 21895 12416 0 15:36 pts/0 00:00:00 ./ksql -U system test
kingbase 22071 13089 0 15:37 ? 00:00:00 kingbase: startup
#查看对应的sys_log日志(backend process被终止导致数据库的辅助进程也将重启)
test=# select * from sys_stat_activity where client_addr='192.168.8.200';
WARNING: terminating connection because of crash of another server process
DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
2022-11-29 15:37:13.828 CST [21894] WARNING: terminating connection because of crash of another server process
2022-11-29 15:37:13.828 CST [21894] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 15:37:13.828 CST [21894] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 15:37:13.829 CST [13089] LOG: server process (PID 21894) exited with exit code 2
2022-11-29 15:37:13.829 CST [13089] DETAIL: Failed process was running: select count(*) from t1;
2022-11-29 15:37:13.829 CST [13089] LOG: terminating any other active server processes
2022-11-29 15:37:13.830 CST [21896] WARNING: terminating connection because of crash of another server process
2022-11-29 15:37:13.830 CST [21896] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 15:37:13.830 CST [21896] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 15:37:13.831 CST [18956] WARNING: terminating connection because of crash of another server process
2022-11-29 15:37:13.831 CST [18956] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 15:37:13.831 CST [18956] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 15:37:13.833 CST [13089] LOG: all server processes terminated; reinitializing
2022-11-29 15:37:13.933 CST [22071] LOG: database system was interrupted; last known up at 2022-11-29 15:19:30 CST
2022-11-29 15:38:29.154 CST [22071] LOG: database system was not properly shut down; automatic recovery in progress
2022-11-29 15:38:29.232 CST [22071] LOG: redo starts at 0/2293680
2022-11-29 15:38:29.232 CST [22071] LOG: redo wal segment count 2
2022-11-29 15:38:29.232 CST [22071] LOG: invalid record length at 0/22936B0: wanted 24, got 0
2022-11-29 15:38:29.232 CST [22071] LOG: redo done at 0/2293680
2022-11-29 15:38:29.767 CST [13089] LOG: database system is ready to accept connection
---如上所示,sys_ctl kill QUIT pid用于终止backend process将给数据库带来极大的风险 。
5、操作系统kill -9 pid
#查询backend process状态信息
prod=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start
| query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query
| backend_type
-------+---------+-------+----------+---------+-------------------+---------------+-----------------+-------------+-------------------------------+--------------------------
-----+-------------------------------+-------------------------------+-----------------+---------------------+--------+-------------+--------------+-------------------------
---------+------------------------------
16385 | prod | 14114 | 10 | system | kingbase_*&+_ | 192.168.8.200 | | 57472 | 2022-11-29 14:48:06.372451+08 |
| 2022-11-29 14:50:16.618969+08 | 2022-11-29 14:50:16.627572+08 | Client | ClientRead | idle | | | select count(*) from t1;
| client backend
(1 rows)
#操作系统下执行kill -9 pid结束进程
[root@node2 ~]# kill -9 14114
#如下所示,除了对应的backend process被终止外,其余的后台辅助进程也被终止并重启
[root@node2 ~]# ps -ef |grep kingbase
.......
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 13090 13089 0 14:39 ? 00:00:00 kingbase: logger
kingbase 14010 12416 0 14:47 pts/0 00:00:00 ./ksql -U system test
kingbase 15651 13089 0 14:57 ? 00:00:00 kingbase: startup
#查看对应的sys_log日志(backend process被终止导致数据库的辅助进程也将重启)
2022-11-29 14:58:00.093 CST [13089] LOG: server process (PID 14114) was terminated by signal 9: Killed
2022-11-29 14:58:00.093 CST [13089] DETAIL: Failed process was running: select count(*) from t1;
2022-11-29 14:58:00.093 CST [13089] LOG: terminating any other active server processes
2022-11-29 14:58:00.093 CST [14602] WARNING: terminating connection because of crash of another server process
2022-11-29 14:58:00.093 CST [14602] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 14:58:00.093 CST [14602] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 14:58:00.095 CST [13095] WARNING: terminating connection because of crash of another server process
2022-11-29 14:58:00.095 CST [13095] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 14:58:00.095 CST [13095] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 14:58:00.384 CST [13089] LOG: all server processes terminated; reinitializing
2022-11-29 14:58:00.464 CST [15651] LOG: database system was interrupted; last known up at 2022-11-29 14:53:42 CST
2022-11-29 14:59:16.706 CST [15651] LOG: database system was not properly shut down; automatic recovery in progress
2022-11-29 14:59:16.806 CST [15651] LOG: redo starts at 0/2293488
2022-11-29 14:59:16.806 CST [15651] LOG: redo wal segment count 2
2022-11-29 14:59:16.806 CST [15651] LOG: invalid record length at 0/2293560: wanted 24, got 0
2022-11-29 14:59:16.806 CST [15651] LOG: redo done at 0/2293530
2022-11-29 14:59:17.563 CST [13089] LOG: database system is ready to accept connections
---如上所示,kill -9 pid用于终止backend process将给数据库带来极大的风险 。
三、总结
对于KingbaseES数据库中异常的backend proces可以采用手工方式终止,执行时选择的方式要注意对数据库带来的风险:
1)相对安全方式:pg_terminate_backend(pid);kill pid;kill -15 pid;sys_ctl kill TERM pid。
2)不安全方式: kill -3 pid;sys_ctl kill QUIT pid;kill -9 pid。
注意:千万不要kill -9,SIGKILL没有信号处理函数,OS会直接停掉进程;Kingbase父进程发现子进程异常退出,会停掉所有进程,释放共享内存,
再重新申请共享内存,拉起所有进程。效果就等于异常重启,启动时肯定会需要时间redo,可能造成几分钟的停止服务。(除非后果可以接受,否则不要kill -9)。
KingbaseES运维案例之---服务进程(backend process)终止的更多相关文章
- 运维案例 | Exchange2010数据库损坏的紧急修复思路
关注嘉为科技,获取运维新知 Exchange后端数据库故障,一般都会是比较严重的紧急故障,因为这会直接影响到大面积用户的正常使用,而且涉及到用户数据.一旦遇到这种级别的故障,管理员往往都是在非常紧 ...
- KingbaseES V8R3集群运维案例之---主库系统down failover切换过程分析
案例说明: KingbaseES V8R3集群failover时两个cluster都会触发,但只有一个cluster会调用脚本去执行真正的切换流程,另一个有对应的打印,但不会调用脚本,只是走相关的 ...
- KingbaseES V8R3集群运维案例之---用户自定义表空间管理
案例说明: KingbaseES 数据库支持用户自定义表空间的创建,并建议表空间的文件存储路径配置到数据库的data目录之外.本案例复现了,当用户自定义表空间存储路径配置到data下时,出现的故障问 ...
- KingbaseES V8R6集群管理运维案例之---repmgr standby switchover故障
案例说明: 在KingbaseES V8R6集群备库执行"repmgr standby switchover"时,切换失败,并且在执行过程中,伴随着"repmr stan ...
- KingbaseES V8R3集群运维案例之---kingbase_monitor.sh启动”two master“案例
案例说明: KingbaseES V8R3集群,执行kingbase_monitor.sh启动集群,出现"two master"节点的故障,启动集群失败:通过手工sys_ctl启动 ...
- KingbaseES V8R6集群运维案例之---repmgr standby promote应用案例
案例说明: 在容灾环境中,跨区域部署的异地备节点不会自主提升为主节点,在主节点发生故障或者人为需要切换时需要手动执行切换操作.若主节点已经失效,希望将异地备机提升为主节点. $bin/repmgr s ...
- KingbaseES V8R3集群运维案例之---cluster.log ERROR: md5 authentication failed
案例说明: 在KingbaseES V8R3集群的cluster.log日志中,经常会出现"ERROR: md5 authentication failed:DETAIL: password ...
- 企业运维案例:xxx is not in the sudoers file.This incident will be reported” 错误解决方法
CentOS6系统下,普通用户使用sudo执行命令时报错: xxx is not in the sudoers file.This incident will be reported" 解决 ...
- Nginx下关于缓存控制字段cache-control的配置说明 - 运维小结
HTTP协议的Cache -Control指定请求和响应遵循的缓存机制.在请求消息或响应消息中设置 Cache-Control并不会影响另一个消息处理过程中的缓存处理过程.请求时的缓存指令包括: no ...
- 虎牙直播运维负责人张观石 | SRE实践指南
虎牙直播运维负责人张观石 本文是根据虎牙直播运维负责人张观石10月20日在msup携手魅族.Flyme.百度云主办的第十三期魅族开放日<虎牙直播平台SRE实践>演讲中的分享内容整理而成. ...
随机推荐
- Neutron详解
一:简介 一.概述 1. 传统的网络管理方式很大程度上依赖于管理员手工配置和维护各种网络硬件设备:而云环境下的网络已经变得非常复杂,特别是在多租户场景里,用户随时都可能需要创建.修改和删除网络 ...
- QT 无法识别某些字体导致程序启动失败
有用户反馈启动程序时,没有出现 UI 界面,程序跟 "闪退了一样",查看日志,没有发现闪退或者报错异常,后面远程用户电脑并尝试解决 研究分析:在用户电脑上运行 debug 包,会出 ...
- django中如果不是第一次迁移的时候就配置AUTH_USER_MODEL(用来告知django认证系统识别我们自定义的模型类),那么该如何解决才能让django的认证系统识别且不会报未知错误?
Django认证系统中提供的用户模型类及方法很方便,我们可以使用这个模型类,但是字段有些无法满足项目需求,如还需要保存用户的手机号,需要给模型类添加额外的字段. Django提供了django.con ...
- 【ACM专项练习#02】输入整行字符串、输入值到vector、取输入整数的每一位
输入整行字符串 平均绩点 题目描述 每门课的成绩分为A.B.C.D.F五个等级,为了计算平均绩点,规定A.B.C.D.F分别代表4分.3分.2分.1分.0分. 输入 有多组测试样例.每组输入数据占一行 ...
- Centos下配置python环境
https://blog.csdn.net/longzhoufeng/article/details/109879818
- SQL之 数据库表字段约束与索引
第三范式 MySQL四种字段约束 主键约束 非空约束 唯一约束 创建索引 添加和删除索引
- SpringMVC快速复习(超详细)
目录 一.SpringMVC简介 1.什么是MVC 2.什么是SpringMVC 3.SpringMVC的特点 二.HelloWorld 1.开发环境 2.创建maven工程 a>添加web模块 ...
- RCC & GPIO库函数&传感器输入
RCC: Reset and Clock Control,即复位和时钟控制. 一般在.h文件的末尾都是一些函数声明,RCC常用的三个函数(外设时钟控制,没有时钟外设不工作): void RCC_A ...
- isNumber 数字正则校验 表达式
isNumber 数字正则校验 表达式 isNumber(value) { return (/(^-?[0-9]+\.{1}\d+$)|(^-?[1-9][0-9]*$)|(^-?0{1}$)/).t ...
- vscode 对js文件不格式化的修正方案 settings.json
修正1 "javascript.format.enable": true, // 这里false 改true 修正2 注释掉这个地方 // "[javascript]&q ...