KingbaseES运维案例之---服务进程(backend process)终止
案例说明:
如下图所示:KingbaseES服务进程结构

KingbaseES使用客户端/服务器的模型。 对于每个客户端的连接,KingbaseES主进程接收到客户端连接后,会为其创建一个新的服务进程。 KingbaseES 用服务进程来处理连接到数据库服务的客户端请求。 该进程负责实际处理客户端的数据库请求,连接断开时退出。当Client连接到数据库时,会有对应的kingbase的服务进程为其提供服务,如以下Client查询访问:

如下所示,操作系统对应的服务进程(backend process):

当Client结束访问正常退出数据库连接时,对应的kingbase的服务进程也将结束;但是当客户端异常退出时,会导致数据库端的kingbase服务进程没有正常结束,并占用数据库资源,本案例将详细描述手工方式对服务进程(backend process)终止。手工结束backend process可以使用数据库工具或者操作系统的kill进程方式,但是不同方式对数据库造成的影响不同。
适用版本:
KingbaseES V8R3/R6
系统架构:

一、客户端访问
[kingbase@node1 bin]$ ./ksql -h 192.168.8.201 -U system -W prod
Password:
ksql (V8.0)
Type "help" for help.
prod=# select count(*) from t1;
count
--------
100000
(1 row)
二、backend process终止方案
1、pg_terminate_backend(pid)方式
Tips:函数 pg_terminate_backend() 实际上是给进程发送了一个 SIGTERM 信号。
# 查询backend process状态信息
prod=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | quer
y_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+------------+--------------
-----------------+-------------------------------+-----------------+------------+-------+-------------+--------------+--------------------------+----------------
16385 | prod | 17100 | 10 | system | kingbase_*&+_ | 192.168.8.200 | | 57476 | 2022-11-29 15:04:57.131987+08 | | 2022-11-29 15
:05:05.526379+08 | 2022-11-29 15:05:05.539018+08 | Client | ClientRead | idle | | | select count(*) from t1; | client backend
(1 row)
#终止backend process对应的pid
prod=# select pg_terminate_backend(17100);
pg_terminate_backend
----------------------
t
(1 row)
#进程被终止
prod=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait
_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+------------+-------------+--------------+-----
------------+------------+-------+-------------+--------------+-------+--------------
(0 rows)
#如下所示,数据库进程正常,对应的backend pross被安全终止
[root@node2 sys_log]# ps -ef |grep kingbase
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 13090 13089 0 14:39 ? 00:00:00 kingbase: logger
kingbase 16474 13089 0 14:59 ? 00:00:00 kingbase: checkpointer
kingbase 16475 13089 0 14:59 ? 00:00:00 kingbase: background writer
kingbase 16476 13089 0 14:59 ? 00:00:00 kingbase: walwriter
kingbase 16477 13089 0 14:59 ? 00:00:00 kingbase: autovacuum launcher
kingbase 16478 13089 0 14:59 ? 00:00:00 kingbase: stats collector
kingbase 16479 13089 0 14:59 ? 00:00:00 kingbase: ksh writer
kingbase 16480 13089 0 14:59 ? 00:00:00 kingbase: ksh collector
kingbase 16481 13089 0 14:59 ? 00:00:00 kingbase: kwr collector
kingbase 16482 13089 0 14:59 ? 00:00:00 kingbase: logical replication launcher
kingbase 17100 13089 0 15:04 ? 00:00:00 kingbase: system prod 192.168.8.200(57476) idle
kingbase 17212 12416 0 15:05 pts/0 00:00:00 ./ksql -U system test
kingbase 17214 13089 0 15:05 ? 00:00:00 kingbase: system prod [local] idle
2、操作系统kill pid方式
#查询backend process状态信息
prod=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | quer
y_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+------------+--------------
-----------------+-------------------------------+-----------------+------------+-------+-------------+--------------+--------------------------+----------------
16385 | prod | 18424 | 10 | system | kingbase_*&+_ | 192.168.8.200 | | 57480 | 2022-11-29 15:15:26.813035+08 | | 2022-11-29 15
:15:28.912910+08 | 2022-11-29 15:15:28.922719+08 | Client | ClientRead | idle | | | select count(*) from t1; | client backend
(1 row)
#操作系统下执行kill pid结束进程
[root@node2 sys_log]# kill 18424
#如下所示,数据库进程正常,对应的backend pross被安全kill
[root@node2 sys_log]# ps -ef |grep kingbase
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 13090 13089 0 14:39 ? 00:00:00 kingbase: logger
kingbase 16474 13089 0 14:59 ? 00:00:00 kingbase: checkpointer
kingbase 16475 13089 0 14:59 ? 00:00:00 kingbase: background writer
kingbase 16476 13089 0 14:59 ? 00:00:00 kingbase: walwriter
kingbase 16477 13089 0 14:59 ? 00:00:00 kingbase: autovacuum launcher
kingbase 16478 13089 0 14:59 ? 00:00:00 kingbase: stats collector
kingbase 16479 13089 0 14:59 ? 00:00:00 kingbase: ksh writer
kingbase 16480 13089 0 14:59 ? 00:00:00 kingbase: ksh collector
kingbase 16481 13089 0 14:59 ? 00:00:00 kingbase: kwr collector
kingbase 16482 13089 0 14:59 ? 00:00:00 kingbase: logical replication launcher
kingbase 17212 12416 0 15:05 pts/0 00:00:00 ./ksql -U system test
kingbase 17214 13089 0 15:05 ? 00:00:00 kingbase: system prod [local] idle
#在数据库视图中已经无此backend process记录
prod=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait
_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+------------+-------------+--------------+-----
------------+------------+-------+-------------+--------------+-------+--------------
(0 rows)
3、操作系统kill -15 pid和数据库sys_ctl kill TERM PID
1)操作系统kill -15 pid
#查询backend process状态信息
test=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | quer
y_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+------------+--------------
-----------------+-------------------------------+-----------------+------------+-------+-------------+--------------+--------------------------+----------------
16385 | prod | 22955 | 10 | system | kingbase_*&+_ | 192.168.8.200 | | 57498 | 2022-11-29 15:44:29.993305+08 | | 2022-11-29 15
:44:32.090913+08 | 2022-11-29 15:44:32.100617+08 | Client | ClientRead | idle | | | select count(*) from t1; | client backend
(1 row)
#操作系统下执行kill -15 pid结束进程
[kingbase@node2 bin]$ kill -15 22955
[kingbase@node2 bin]$ ps -ef |grep kingbase
#如下所示,数据库进程正常,对应的backend pross被安全kill
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 22197 13089 0 15:38 ? 00:00:00 kingbase: checkpointer
kingbase 22198 13089 0 15:38 ? 00:00:00 kingbase: background writer
kingbase 22199 13089 0 15:38 ? 00:00:00 kingbase: walwriter
kingbase 22200 13089 0 15:38 ? 00:00:00 kingbase: autovacuum launcher
kingbase 22201 13089 0 15:38 ? 00:00:00 kingbase: stats collector
kingbase 22202 13089 0 15:38 ? 00:00:00 kingbase: ksh writer
kingbase 22203 13089 0 15:38 ? 00:00:00 kingbase: ksh collector
kingbase 22204 13089 0 15:38 ? 00:00:00 kingbase: kwr collector
kingbase 22205 13089 0 15:38 ? 00:00:00 kingbase: logical replication launcher
kingbase 22444 12416 0 15:40 pts/0 00:00:00 ./ksql -U system test
kingbase 22445 13089 0 15:40 ? 00:00:00 kingbase: system test [local] idle
#在数据库视图中已经无此backend process记录
test=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait
_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+------------+-------------+--------------+-----
------------+------------+-------+-------------+--------------+-------+--------------
(0 rows)
2)数据库sys_ctl kill TERM pid
#查询backend process状态信息
test=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | quer
y_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+------------+--------------
-----------------+-------------------------------+-----------------+------------+-------+-------------+--------------+--------------------------+----------------
16385 | prod | 22443 | 10 | system | kingbase_*&+_ | 192.168.8.200 | | 57494 | 2022-11-29 15:40:42.804868+08 | | 2022-11-29 15
:40:44.972533+08 | 2022-11-29 15:40:44.985340+08 | Client | ClientRead | idle | | | select count(*) from t1; | client backend
(1 row)
#执行数据库命令kill进程
[kingbase@node2 bin]$ ./sys_ctl kill TERM 22443
#如下所示,数据库进程正常,对应的backend pross被安全kill
[kingbase@node2 bin]$ ps -ef |grep kingbase
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 22197 13089 0 15:38 ? 00:00:00 kingbase: checkpointer
kingbase 22198 13089 0 15:38 ? 00:00:00 kingbase: background writer
kingbase 22199 13089 0 15:38 ? 00:00:00 kingbase: walwriter
kingbase 22200 13089 0 15:38 ? 00:00:00 kingbase: autovacuum launcher
kingbase 22201 13089 0 15:38 ? 00:00:00 kingbase: stats collector
kingbase 22202 13089 0 15:38 ? 00:00:00 kingbase: ksh writer
kingbase 22203 13089 0 15:38 ? 00:00:00 kingbase: ksh collector
kingbase 22204 13089 0 15:38 ? 00:00:00 kingbase: kwr collector
kingbase 22205 13089 0 15:38 ? 00:00:00 kingbase: logical replication launcher
kingbase 22444 12416 0 15:40 pts/0 00:00:00 ./ksql -U system test
kingbase 22445 13089 0 15:40 ? 00:00:00 kingbase: system test [local] idle
#在数据库视图中已经无此backend process记录
test=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait
_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+------------+-------------+--------------+-----
------------+------------+-------+-------------+--------------+-------+--------------
(0 rows)
4、操作系统kill -3 pid和数据库sys_ctl kill QUIT PID
1)操作系统kill -3 pid
#查询backend process状态信息
prod=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | quer
y_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query
| backend_type
-------+---------+-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+------------+--------------
-----------------+-------------------------------+-----------------+------------+-------+-------------+--------------+-------------------------------------------------------
--------------+----------------
16385 | prod | 18666 | 10 | system | kingbase_*&+_ | 192.168.8.200 | | 57486 | 2022-11-29 15:17:34.726155+08 | | 2022-11-29 15
:17:34.736202+08 | 2022-11-29 15:17:34.740584+08 | Client | ClientRead | idle | | | select setting from pg_settings where name = 'enable_u
pper_colname' | client backend
(1 row)
#操作系统下执行kill -3 pid结束进程
[root@node2 sys_log]# kill -3 18666
#如下所示,除了对应的backend process被终止外,其余的后台辅助进程也被终止并重启
[root@node2 sys_log]# ps -ef |grep kingbase
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 13090 13089 0 14:39 ? 00:00:00 kingbase: logger
kingbase 17212 12416 0 15:05 pts/0 00:00:00 ./ksql -U system test
kingbase 18795 13089 0 15:18 ? 00:00:00 kingbase: startup
#查看对应的sys_log日志(backend process被终止导致数据库的辅助进程也将重启)
prod=# select * from sys_stat_activity where client_addr='192.168.8.200';
WARNING: terminating connection because of crash of another server process
DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
2022-11-29 15:18:06.741 CST [18666] WARNING: terminating connection because of crash of another server process
2022-11-29 15:18:06.741 CST [18666] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 15:18:06.741 CST [18666] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 15:18:06.742 CST [13089] LOG: server process (PID 18666) exited with exit code 2
2022-11-29 15:18:06.742 CST [13089] DETAIL: Failed process was running: select setting from pg_settings where name = 'enable_upper_colname'
2022-11-29 15:18:06.742 CST [13089] LOG: terminating any other active server processes
2022-11-29 15:18:06.743 CST [17214] WARNING: terminating connection because of crash of another server process
2022-11-29 15:18:06.743 CST [17214] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 15:18:06.743 CST [17214] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 15:18:06.744 CST [16477] WARNING: terminating connection because of crash of another server process
2022-11-29 15:18:06.744 CST [16477] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 15:18:06.744 CST [16477] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 15:18:06.745 CST [13089] LOG: all server processes terminated; reinitializing
2022-11-29 15:18:06.823 CST [18795] LOG: database system was interrupted; last known up at 2022-11-29 14:59:17 CST
2022-11-29 15:19:29.897 CST [18795] LOG: database system was not properly shut down; automatic recovery in progress
2022-11-29 15:19:30.063 CST [18795] LOG: redo starts at 0/22935D8
2022-11-29 15:19:30.063 CST [18795] LOG: redo wal segment count 2
2022-11-29 15:19:30.063 CST [18795] LOG: invalid record length at 0/2293608: wanted 24, got 0
2022-11-29 15:19:30.063 CST [18795] LOG: redo done at 0/22935D8
2022-11-29 15:19:30.739 CST [13089] LOG: database system is ready to accept connections
#数据库服务重启后
[root@node2 sys_log]# ps -ef |grep kingbase
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 13090 13089 0 14:39 ? 00:00:00 kingbase: logger
kingbase 17212 12416 0 15:05 pts/0 00:00:00 ./ksql -U system test
kingbase 18953 13089 0 15:19 ? 00:00:00 kingbase: checkpointer
kingbase 18954 13089 0 15:19 ? 00:00:00 kingbase: background writer
kingbase 18955 13089 0 15:19 ? 00:00:00 kingbase: walwriter
kingbase 18957 13089 0 15:19 ? 00:00:00 kingbase: stats collector
kingbase 18958 13089 0 15:19 ? 00:00:00 kingbase: ksh writer
kingbase 18959 13089 0 15:19 ? 00:00:00 kingbase: ksh collector
kingbase 18960 13089 0 15:19 ? 00:00:00 kingbase: kwr collector
---如上所示,kill -3 pid用于终止backend process将给数据库带来极大的风险 。
2)数据库sys_ctl kill QUIT PID
#查询backend process状态信息
test=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | quer
y_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+------------+--------------
-----------------+-------------------------------+-----------------+------------+-------+-------------+--------------+--------------------------+----------------
16385 | prod | 21894 | 10 | system | kingbase_*&+_ | 192.168.8.200 | | 57490 | 2022-11-29 15:36:10.034020+08 | | 2022-11-29 15
:36:13.902728+08 | 2022-11-29 15:36:13.917841+08 | Client | ClientRead | idle | | | select count(*) from t1; | client backend
(1 row)
#执行数据库命令sys_ctl kill QUIT终止backend process
[kingbase@node2 bin]$ ./sys_ctl kill QUIT 21894
#如下所示,除了对应的backend process被终止外,其余的后台辅助进程也被终止并重启
[kingbase@node2 bin]$ ps -ef |grep kingbase
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 13090 13089 0 14:39 ? 00:00:00 kingbase: logger
kingbase 21895 12416 0 15:36 pts/0 00:00:00 ./ksql -U system test
kingbase 22071 13089 0 15:37 ? 00:00:00 kingbase: startup
#查看对应的sys_log日志(backend process被终止导致数据库的辅助进程也将重启)
test=# select * from sys_stat_activity where client_addr='192.168.8.200';
WARNING: terminating connection because of crash of another server process
DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
2022-11-29 15:37:13.828 CST [21894] WARNING: terminating connection because of crash of another server process
2022-11-29 15:37:13.828 CST [21894] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 15:37:13.828 CST [21894] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 15:37:13.829 CST [13089] LOG: server process (PID 21894) exited with exit code 2
2022-11-29 15:37:13.829 CST [13089] DETAIL: Failed process was running: select count(*) from t1;
2022-11-29 15:37:13.829 CST [13089] LOG: terminating any other active server processes
2022-11-29 15:37:13.830 CST [21896] WARNING: terminating connection because of crash of another server process
2022-11-29 15:37:13.830 CST [21896] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 15:37:13.830 CST [21896] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 15:37:13.831 CST [18956] WARNING: terminating connection because of crash of another server process
2022-11-29 15:37:13.831 CST [18956] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 15:37:13.831 CST [18956] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 15:37:13.833 CST [13089] LOG: all server processes terminated; reinitializing
2022-11-29 15:37:13.933 CST [22071] LOG: database system was interrupted; last known up at 2022-11-29 15:19:30 CST
2022-11-29 15:38:29.154 CST [22071] LOG: database system was not properly shut down; automatic recovery in progress
2022-11-29 15:38:29.232 CST [22071] LOG: redo starts at 0/2293680
2022-11-29 15:38:29.232 CST [22071] LOG: redo wal segment count 2
2022-11-29 15:38:29.232 CST [22071] LOG: invalid record length at 0/22936B0: wanted 24, got 0
2022-11-29 15:38:29.232 CST [22071] LOG: redo done at 0/2293680
2022-11-29 15:38:29.767 CST [13089] LOG: database system is ready to accept connection
---如上所示,sys_ctl kill QUIT pid用于终止backend process将给数据库带来极大的风险 。
5、操作系统kill -9 pid
#查询backend process状态信息
prod=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start
| query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query
| backend_type
-------+---------+-------+----------+---------+-------------------+---------------+-----------------+-------------+-------------------------------+--------------------------
-----+-------------------------------+-------------------------------+-----------------+---------------------+--------+-------------+--------------+-------------------------
---------+------------------------------
16385 | prod | 14114 | 10 | system | kingbase_*&+_ | 192.168.8.200 | | 57472 | 2022-11-29 14:48:06.372451+08 |
| 2022-11-29 14:50:16.618969+08 | 2022-11-29 14:50:16.627572+08 | Client | ClientRead | idle | | | select count(*) from t1;
| client backend
(1 rows)
#操作系统下执行kill -9 pid结束进程
[root@node2 ~]# kill -9 14114
#如下所示,除了对应的backend process被终止外,其余的后台辅助进程也被终止并重启
[root@node2 ~]# ps -ef |grep kingbase
.......
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 13090 13089 0 14:39 ? 00:00:00 kingbase: logger
kingbase 14010 12416 0 14:47 pts/0 00:00:00 ./ksql -U system test
kingbase 15651 13089 0 14:57 ? 00:00:00 kingbase: startup
#查看对应的sys_log日志(backend process被终止导致数据库的辅助进程也将重启)
2022-11-29 14:58:00.093 CST [13089] LOG: server process (PID 14114) was terminated by signal 9: Killed
2022-11-29 14:58:00.093 CST [13089] DETAIL: Failed process was running: select count(*) from t1;
2022-11-29 14:58:00.093 CST [13089] LOG: terminating any other active server processes
2022-11-29 14:58:00.093 CST [14602] WARNING: terminating connection because of crash of another server process
2022-11-29 14:58:00.093 CST [14602] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 14:58:00.093 CST [14602] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 14:58:00.095 CST [13095] WARNING: terminating connection because of crash of another server process
2022-11-29 14:58:00.095 CST [13095] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 14:58:00.095 CST [13095] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 14:58:00.384 CST [13089] LOG: all server processes terminated; reinitializing
2022-11-29 14:58:00.464 CST [15651] LOG: database system was interrupted; last known up at 2022-11-29 14:53:42 CST
2022-11-29 14:59:16.706 CST [15651] LOG: database system was not properly shut down; automatic recovery in progress
2022-11-29 14:59:16.806 CST [15651] LOG: redo starts at 0/2293488
2022-11-29 14:59:16.806 CST [15651] LOG: redo wal segment count 2
2022-11-29 14:59:16.806 CST [15651] LOG: invalid record length at 0/2293560: wanted 24, got 0
2022-11-29 14:59:16.806 CST [15651] LOG: redo done at 0/2293530
2022-11-29 14:59:17.563 CST [13089] LOG: database system is ready to accept connections
---如上所示,kill -9 pid用于终止backend process将给数据库带来极大的风险 。
三、总结
对于KingbaseES数据库中异常的backend proces可以采用手工方式终止,执行时选择的方式要注意对数据库带来的风险:
1)相对安全方式:pg_terminate_backend(pid);kill pid;kill -15 pid;sys_ctl kill TERM pid。
2)不安全方式: kill -3 pid;sys_ctl kill QUIT pid;kill -9 pid。
注意:千万不要kill -9,SIGKILL没有信号处理函数,OS会直接停掉进程;Kingbase父进程发现子进程异常退出,会停掉所有进程,释放共享内存,
再重新申请共享内存,拉起所有进程。效果就等于异常重启,启动时肯定会需要时间redo,可能造成几分钟的停止服务。(除非后果可以接受,否则不要kill -9)。
KingbaseES运维案例之---服务进程(backend process)终止的更多相关文章
- 运维案例 | Exchange2010数据库损坏的紧急修复思路
关注嘉为科技,获取运维新知 Exchange后端数据库故障,一般都会是比较严重的紧急故障,因为这会直接影响到大面积用户的正常使用,而且涉及到用户数据.一旦遇到这种级别的故障,管理员往往都是在非常紧 ...
- KingbaseES V8R3集群运维案例之---主库系统down failover切换过程分析
案例说明: KingbaseES V8R3集群failover时两个cluster都会触发,但只有一个cluster会调用脚本去执行真正的切换流程,另一个有对应的打印,但不会调用脚本,只是走相关的 ...
- KingbaseES V8R3集群运维案例之---用户自定义表空间管理
案例说明: KingbaseES 数据库支持用户自定义表空间的创建,并建议表空间的文件存储路径配置到数据库的data目录之外.本案例复现了,当用户自定义表空间存储路径配置到data下时,出现的故障问 ...
- KingbaseES V8R6集群管理运维案例之---repmgr standby switchover故障
案例说明: 在KingbaseES V8R6集群备库执行"repmgr standby switchover"时,切换失败,并且在执行过程中,伴随着"repmr stan ...
- KingbaseES V8R3集群运维案例之---kingbase_monitor.sh启动”two master“案例
案例说明: KingbaseES V8R3集群,执行kingbase_monitor.sh启动集群,出现"two master"节点的故障,启动集群失败:通过手工sys_ctl启动 ...
- KingbaseES V8R6集群运维案例之---repmgr standby promote应用案例
案例说明: 在容灾环境中,跨区域部署的异地备节点不会自主提升为主节点,在主节点发生故障或者人为需要切换时需要手动执行切换操作.若主节点已经失效,希望将异地备机提升为主节点. $bin/repmgr s ...
- KingbaseES V8R3集群运维案例之---cluster.log ERROR: md5 authentication failed
案例说明: 在KingbaseES V8R3集群的cluster.log日志中,经常会出现"ERROR: md5 authentication failed:DETAIL: password ...
- 企业运维案例:xxx is not in the sudoers file.This incident will be reported” 错误解决方法
CentOS6系统下,普通用户使用sudo执行命令时报错: xxx is not in the sudoers file.This incident will be reported" 解决 ...
- Nginx下关于缓存控制字段cache-control的配置说明 - 运维小结
HTTP协议的Cache -Control指定请求和响应遵循的缓存机制.在请求消息或响应消息中设置 Cache-Control并不会影响另一个消息处理过程中的缓存处理过程.请求时的缓存指令包括: no ...
- 虎牙直播运维负责人张观石 | SRE实践指南
虎牙直播运维负责人张观石 本文是根据虎牙直播运维负责人张观石10月20日在msup携手魅族.Flyme.百度云主办的第十三期魅族开放日<虎牙直播平台SRE实践>演讲中的分享内容整理而成. ...
随机推荐
- Springboot+LayUI实现一个简易评论系统
说明 这是个简单的评论系统,目的在于介绍简单的评论和回复功能.同时基于此可以扩展更全面的.自定义的评论系统,本工程仅供学习交流使用.喜欢的朋友给个赞:) 源码 https://gitee.com/in ...
- junit使用stub进行单元测试
stub是代码的一部分,我们要对某一方法做单元测试时,可能涉及到调用第三方web服务.假如当前该服务不存在或不可用咋办?好办,写一段stub代码替代它. stub 技术就是把某一部分代码与环境隔离起来 ...
- Python笔记五之正则表达式
本文首发于公众号:Hunter后端 原文链接:Python笔记五之正则表达式 这一篇笔记介绍在 Python 里使用正则表达式. 正则表达式,Regular Expression,可用于在一个目标字符 ...
- 项目实战:Qt终端命令模拟工具 v1.0.0(实时获取命令行输出,执行指令,模拟ctrl+c中止操作)
需求 在Qt软件中实现部分终端控制命令行功能,使软件内可以又好的模拟终端控制,提升软件整体契合度. Demo演示 运行包下载地址: CSDNf粉丝0积分下载:https: ...
- ABP Suite模块项目中设置菜单及其多语言
1.Blazor的菜单构造的类 ABP Suite自动生成的是这样: 2.从Study.Trade.Web的Menus下拷贝内容过来后 3.TradeMenus中增加一个常量 4.启动程序 单击Tra ...
- 【Azure K8S|AKS】进入AKS的POD中查看文件,例如PVC Volume Mounts使用情况
问题描述 在昨天的文章中,创建了 Disk + PV + PVC + POD 方案(https://www.cnblogs.com/lulight/p/17604441.html),那么如何进入到PO ...
- 【Azure 应用服务】 在App Service中无法上传证书[Private Key Certificates (.pfx)],导入Azure Key Vault中的证书也无法成功
问题描述 在App Service的TLS/SSL settings页面,切换到Private Key Certificates (.pfx),通过Import Key Vault Certifica ...
- 【Azure 事件中心】Event Hub 消费端出现 Timeout Exception,errorContext中 LINK_CREDIT为0的解释
问题描述 在使用Event Hub SDK消费数据过程中,出现大量的Timeout Exception,详细消息为: com.microsoft.azure.eventhubs.TimeoutExce ...
- 【Azure 环境】Notification Hub无法创建Policy : 出现 500 Internal Server Error
问题描述 Notification Hub出现问题,无法创建Policy(Rule). 获得的错误消息: Error: Error making Read request on Authorizati ...
- 协程的async使用
async与launch一样都是开启一个协程,但是async会返回一个Deferred对象,该Deferred也是一个job async函数类似于 launch函数.它启动了一个单独的协程,这是一个轻 ...