The people using PostgreSQL and the Streaming Replication feature seem to ask many of the same questions:

1. How best to monitor Streaming Replication?

2. What is the best way to do that?

3. Are there alternatives, when monitoring on Standby, to using the pg_stat_replication view on Master?

4. How should I calculate replication lag-time, in seconds, minutes, etc.?

In light of these commonly asked questions, I thought a blog would help. The following are some methods I’ve found to be useful.

Monitoring is critical for large infrastructure deployments where you have Streaming Replication for:

1. Disaster recovery

2. Streaming Replication is for High Availability

3. Load balancing, when using Streaming Replication with Hot Standby

PostgreSQL has some building blocks for replication monitoring, and the following are some important functions and views which can be use for monitoring the replication:

1. pg_stat_replication view on master/primary server.

This view helps in monitoring the standby on Master. It gives you the following details:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
pid:              Process id of walsender process
usesysid:         OID of user which is used for Streaming replication.
usename:          Name of user which is used for Streaming replication
application_name: Application name connected to master
client_addr:      Address of standby/streaming replication
client_hostname:  Hostname of standby.
client_port:      TCP port number on which standby communicating with WAL sender
backend_start:    Start time when SR connected to Master.
state:            Current WAL sender state i.e streaming
sent_location:    Last transaction location sent to standby.
write_location:   Last transaction written on disk at standby
flush_location:   Last transaction flush on disk at standby.
replay_location:  Last transaction flush on disk at standby.
sync_priority:    Priority of standby server being chosen as synchronous standby
sync_state:       Sync State of standby (is it async or synchronous).

e.g.:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
postgres=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+---------------------------------
pid              | 1114
usesysid         | 16384
usename          | repuser
application_name | walreceiver
client_addr      | 172.17.0.3
client_hostname  |
client_port      | 52444
backend_start    | 15-MAY-14 19:54:05.535695 -04:00
state            | streaming
sent_location    | 0/290044C0
write_location   | 0/290044C0
flush_location   | 0/290044C0
replay_location  | 0/290044C0
sync_priority    | 0
sync_state       | async

2. pg_is_in_recovery() : Function which tells whether standby is still in recovery mode or not.

e.g.

1
2
3
4
5
postgres=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)

3. pg_last_xlog_receive_location: Function which tells location of last transaction log which was streamed by Standby and also written on standby disk.

e.g.

1
2
3
4
5
postgres=# select pg_last_xlog_receive_location();
 pg_last_xlog_receive_location
-------------------------------
 0/29004560
(1 row)

4. pg_last_xlog_replay_location: Function which tells last transaction replayed during recovery process. e.g is given below:

1
2
3
4
5
postgres=# select pg_last_xlog_replay_location();
 pg_last_xlog_replay_location
------------------------------
 0/29004560
(1 row)

5. pg_last_xact_replay_timestamp: This function tells about the time stamp of last transaction which was replayed during recovery. Below is an example:

1
2
3
4
5
postgres=# select pg_last_xact_replay_timestamp();
  pg_last_xact_replay_timestamp
----------------------------------
 15-MAY-14 20:54:27.635591 -04:00
(1 row)

Above are some important functions/views, which are already available in PostgreSQL for monitoring the streaming replication.

So, the logical next question is, “What’s the right way to monitor the Hot Standby with Streaming Replication on Standby Server?”

If you have Hot Standby with Streaming Replication, the following are the points you should monitor:

1. Check if your Hot Standby is in recovery mode or not:

For this you can use pg_is_in_recovery() function.

2.Check whether Streaming Replication is working or not.

And easy way of doing this is checking the pg_stat_replication view on Master/Primary. This view gives information only on master if Streaming Replication is working.

3. Check If Streaming Replication is not working and Hot standby is recovering from archived WAL file.

For this, either the DBA can use the PostgreSQL Log file to monitor it or utilize the following functions provided in PostgreSQL 9.3:

1
2
pg_last_xlog_replay_location();
pg_last_xact_replay_timestamp();

4. Check how far off is the Standby from Master.

There are two ways to monitor lag for Standby.



   i. Lags in Bytes: For calculating lags in bytes, users can use thepg_stat_replication view on the master with the functionpg_xlog_location_diff function. Below is an example:

1
pg_xlog_location_diff(pg_stat_replication.sent_location, pg_stat_replication.replay_location)

which gives the lag in bytes.

  ii. Calculating lags in Seconds. The following is SQL, which most people uses to find the lag in seconds:

1
2
3
4
SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()
              THEN 0
            ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
       END AS log_delay;

Including the above into your repertoire can give you good monitoring for PostgreSQL.

I will in a future post include the script that can be used for monitoring the Hot Standby with PostgreSQL streaming replication.

Since 9.6 this is a lot easier as it introduced the function pg_blocking_pids() to find the sessions that are blocking another session.

So you can use something like this:

How to check blocking processes in postgresql

select pid,
usename,
pg_blocking_pids(pid) as blocked_by,
query as blocked_query
from pg_stat_activity
where cardinality(pg_blocking_pids(pid)) > 0;

14 - How to check replication status的更多相关文章

  1. java.lang.IllegalStateException: Failed to check the status of the service

    java.lang.IllegalStateException: Failed to check the status of the service com.pinyougou.sellergoods ...

  2. DTM initialization: failure during startup recovery, retry failed, check segment status (cdbtm.c:1603)

    安装greenplum集群出现以下错误: 20160315:13:49:16:025696 gpinitsystem:h95:jason-[INFO]:-Checking configuration ...

  3. check failed status == cudnn_status_success (4 vs. 0) cudnn_status_internal_error

    Check failed: error == cudaSuccess (30 vs. 0) unknown error  这个有可能是显存不足造成的,或者网络参数不对造成的 check failed ...

  4. 关于Failed to check the status of the service com.taotao.service.ItemService. No provider available fo

    原文:http://www.bubuko.com/infodetail-2250226.html 项目中用dubbo发生: Failed to check the status of the serv ...

  5. Check failed: status == CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPPING_ERROR

    I0930 21:23:15.115576 30918 solver.cpp:281] Learning Rate Policy: multistepF0930 21:23:17.263314 310 ...

  6. CUDA报错: Cannot create Cublas handle. Cublas won't be available. 以及:Check failed: status == CUBLAS_STATUS_SUCCESS (1 vs. 0) CUBLAS_STATUS_NOT_INITIALIZED

    Error描述: aita@aita-Alienware-Area-51-R5:~/AITA2/daisida/ssd-github/caffe$ make runtest -j8 .build_re ...

  7. node js fcoin api 出现 api key check fail : {"status":1090,"msg":"Illegal API signature"}

    //主区://ft / btc 不支持市价 买入数量不能小于5个FT 买//ft / eth 支持市价 最小买入eth不能小于0.01 买//ft / usdt 支持市价 最小买入usdt不能小于10 ...

  8. dubbo Failed to check the status of the service com.user.service.UserService. No provider available for the service

    Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'u ...

  9. springboot 报错nested exception is java.lang.IllegalStateException: Failed to check the status of the service xxxService No provider available for the service

    spring: dubbo:#关闭所有服务的启动时检查:(没有提供者时报错) consumer: check: false timeout: 3000

随机推荐

  1. 主机服务绑定IP

    在用 netstat -na  查看当前主机提供的服务,例如显示如下结果: tcp        0      0 127.0.0.1:9000              0.0.0.0:*      ...

  2. Mac 解决 Sourcetree 同步代码总需要密码的问题

    git config --global credential.helper osxkeychain

  3. C#+EntityFramework编程方式详细之Database First

    Database First “Database First”模式即“数据库优先”,其实Database First 与Model First 很类似,只不过一个是有数据可一个是创建数据库,具体的操作 ...

  4. MySQL查询表中某个字段的重复数据

    1. 查询SQL表中某个字段的重复数据 SELECT user_name,COUNT(*) AS count FROM db_user_info GROUP BY user_name HAVING c ...

  5. 随机获取min和max之间的一个整数

    // 随机获取min和max之间的一个整数 const randomNum = (Min, Max) => { let Range = Max - Min; let Rand = Math.ra ...

  6. jquery 第三章

    1.回顾$(document).ready(function(){    })$(function(){    }) ID选择器.类选择器.元素选择器层次选择器:空格(上文下:tr td{})属性过滤 ...

  7. python第10天(上)

    multiprocessing包是Python中的多进程管理包.与threading.Thread类似,它可以利用multiprocessing.Process对象来创建一个进程.该进程可以运行在Py ...

  8. mysql触发器Before和After的区别

    Before与After区别:before:(insert.update)可以对new进行修改.                    after不能对new进行修改.                 ...

  9. 在SOUI中使用线性布局

    SOUI 2.5.1.1开始支持线性布局(LinearLayout). 要在SOUI布局中使用线性布局, 需要在布局容器窗口里指定布局类型为vbox | hbox, (vbox为垂直线性布局, hbo ...

  10. CodeForces 553E Kyoya and Train 动态规划 多项式 FFT 分治

    原文链接http://www.cnblogs.com/zhouzhendong/p/8847145.html 题目传送门 - CodeForces 553E 题意 一个有$n$个节点$m$条边的有向图 ...