The people using PostgreSQL and the Streaming Replication feature seem to ask many of the same questions:

1. How best to monitor Streaming Replication?

2. What is the best way to do that?

3. Are there alternatives, when monitoring on Standby, to using the pg_stat_replication view on Master?

4. How should I calculate replication lag-time, in seconds, minutes, etc.?

In light of these commonly asked questions, I thought a blog would help. The following are some methods I’ve found to be useful.

Monitoring is critical for large infrastructure deployments where you have Streaming Replication for:

1. Disaster recovery

2. Streaming Replication is for High Availability

3. Load balancing, when using Streaming Replication with Hot Standby

PostgreSQL has some building blocks for replication monitoring, and the following are some important functions and views which can be use for monitoring the replication:

1. pg_stat_replication view on master/primary server.

This view helps in monitoring the standby on Master. It gives you the following details:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
pid:              Process id of walsender process
usesysid:         OID of user which is used for Streaming replication.
usename:          Name of user which is used for Streaming replication
application_name: Application name connected to master
client_addr:      Address of standby/streaming replication
client_hostname:  Hostname of standby.
client_port:      TCP port number on which standby communicating with WAL sender
backend_start:    Start time when SR connected to Master.
state:            Current WAL sender state i.e streaming
sent_location:    Last transaction location sent to standby.
write_location:   Last transaction written on disk at standby
flush_location:   Last transaction flush on disk at standby.
replay_location:  Last transaction flush on disk at standby.
sync_priority:    Priority of standby server being chosen as synchronous standby
sync_state:       Sync State of standby (is it async or synchronous).

e.g.:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
postgres=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+---------------------------------
pid              | 1114
usesysid         | 16384
usename          | repuser
application_name | walreceiver
client_addr      | 172.17.0.3
client_hostname  |
client_port      | 52444
backend_start    | 15-MAY-14 19:54:05.535695 -04:00
state            | streaming
sent_location    | 0/290044C0
write_location   | 0/290044C0
flush_location   | 0/290044C0
replay_location  | 0/290044C0
sync_priority    | 0
sync_state       | async

2. pg_is_in_recovery() : Function which tells whether standby is still in recovery mode or not.

e.g.

1
2
3
4
5
postgres=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)

3. pg_last_xlog_receive_location: Function which tells location of last transaction log which was streamed by Standby and also written on standby disk.

e.g.

1
2
3
4
5
postgres=# select pg_last_xlog_receive_location();
 pg_last_xlog_receive_location
-------------------------------
 0/29004560
(1 row)

4. pg_last_xlog_replay_location: Function which tells last transaction replayed during recovery process. e.g is given below:

1
2
3
4
5
postgres=# select pg_last_xlog_replay_location();
 pg_last_xlog_replay_location
------------------------------
 0/29004560
(1 row)

5. pg_last_xact_replay_timestamp: This function tells about the time stamp of last transaction which was replayed during recovery. Below is an example:

1
2
3
4
5
postgres=# select pg_last_xact_replay_timestamp();
  pg_last_xact_replay_timestamp
----------------------------------
 15-MAY-14 20:54:27.635591 -04:00
(1 row)

Above are some important functions/views, which are already available in PostgreSQL for monitoring the streaming replication.

So, the logical next question is, “What’s the right way to monitor the Hot Standby with Streaming Replication on Standby Server?”

If you have Hot Standby with Streaming Replication, the following are the points you should monitor:

1. Check if your Hot Standby is in recovery mode or not:

For this you can use pg_is_in_recovery() function.

2.Check whether Streaming Replication is working or not.

And easy way of doing this is checking the pg_stat_replication view on Master/Primary. This view gives information only on master if Streaming Replication is working.

3. Check If Streaming Replication is not working and Hot standby is recovering from archived WAL file.

For this, either the DBA can use the PostgreSQL Log file to monitor it or utilize the following functions provided in PostgreSQL 9.3:

1
2
pg_last_xlog_replay_location();
pg_last_xact_replay_timestamp();

4. Check how far off is the Standby from Master.

There are two ways to monitor lag for Standby.



   i. Lags in Bytes: For calculating lags in bytes, users can use thepg_stat_replication view on the master with the functionpg_xlog_location_diff function. Below is an example:

1
pg_xlog_location_diff(pg_stat_replication.sent_location, pg_stat_replication.replay_location)

which gives the lag in bytes.

  ii. Calculating lags in Seconds. The following is SQL, which most people uses to find the lag in seconds:

1
2
3
4
SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()
              THEN 0
            ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
       END AS log_delay;

Including the above into your repertoire can give you good monitoring for PostgreSQL.

I will in a future post include the script that can be used for monitoring the Hot Standby with PostgreSQL streaming replication.

Since 9.6 this is a lot easier as it introduced the function pg_blocking_pids() to find the sessions that are blocking another session.

So you can use something like this:

How to check blocking processes in postgresql

select pid,
usename,
pg_blocking_pids(pid) as blocked_by,
query as blocked_query
from pg_stat_activity
where cardinality(pg_blocking_pids(pid)) > 0;

14 - How to check replication status的更多相关文章

  1. java.lang.IllegalStateException: Failed to check the status of the service

    java.lang.IllegalStateException: Failed to check the status of the service com.pinyougou.sellergoods ...

  2. DTM initialization: failure during startup recovery, retry failed, check segment status (cdbtm.c:1603)

    安装greenplum集群出现以下错误: 20160315:13:49:16:025696 gpinitsystem:h95:jason-[INFO]:-Checking configuration ...

  3. check failed status == cudnn_status_success (4 vs. 0) cudnn_status_internal_error

    Check failed: error == cudaSuccess (30 vs. 0) unknown error  这个有可能是显存不足造成的,或者网络参数不对造成的 check failed ...

  4. 关于Failed to check the status of the service com.taotao.service.ItemService. No provider available fo

    原文:http://www.bubuko.com/infodetail-2250226.html 项目中用dubbo发生: Failed to check the status of the serv ...

  5. Check failed: status == CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPPING_ERROR

    I0930 21:23:15.115576 30918 solver.cpp:281] Learning Rate Policy: multistepF0930 21:23:17.263314 310 ...

  6. CUDA报错: Cannot create Cublas handle. Cublas won't be available. 以及:Check failed: status == CUBLAS_STATUS_SUCCESS (1 vs. 0) CUBLAS_STATUS_NOT_INITIALIZED

    Error描述: aita@aita-Alienware-Area-51-R5:~/AITA2/daisida/ssd-github/caffe$ make runtest -j8 .build_re ...

  7. node js fcoin api 出现 api key check fail : {"status":1090,"msg":"Illegal API signature"}

    //主区://ft / btc 不支持市价 买入数量不能小于5个FT 买//ft / eth 支持市价 最小买入eth不能小于0.01 买//ft / usdt 支持市价 最小买入usdt不能小于10 ...

  8. dubbo Failed to check the status of the service com.user.service.UserService. No provider available for the service

    Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'u ...

  9. springboot 报错nested exception is java.lang.IllegalStateException: Failed to check the status of the service xxxService No provider available for the service

    spring: dubbo:#关闭所有服务的启动时检查:(没有提供者时报错) consumer: check: false timeout: 3000

随机推荐

  1. javascript基础 之 表单

    1,js可以验证表单 实例1,js获取表单的内容 //html表单是这样的 <form name="myForm" action="demo_form.php&qu ...

  2. 帆软报表(finereport)安装/配置

    1.首先是安装帆软报表软件 下载地址:http://www.finereport.com/product/download           激活码注册格账号就有了 2.启动软件,新建连接数据库 点 ...

  3. 【原创】大数据基础之SPARK(9)SPARK中COLLECT和TAKE实现原理

    spark中要将计算结果取回driver,有两种方式:collect和take,这两种方式有什么差别?来看代码: org.apache.spark.rdd.RDD /** * Return an ar ...

  4. 完整备份和差异备份数据库的SQL脚本

    工作中需要创建SQL Job对数据库进行定期备份,现把脚本记录如下. 1. 完整备份: -- FULL declare @filename varchar(1024), @file_dev varch ...

  5. Windows Internals 笔记——内核对象

    1.每个内核对象都只是一个内存块,它由操作系统内核分配,并只能由操作系统内核访问.这个内存块是一个数据结构,其成员维护着与对象相关的信息. 2.调用一个会创建内核对象的函数后,函数会返回一个句柄,它标 ...

  6. 【winform】datagridview获取当前行停留时间

    RowStateChanged 的问题 RowStateChanged事件,也就是行状态发生变化时触发的事件,这个事件无法实现行号变化而触发这个要求,因为当我们从一行选择至另一行时,先触发原行号的状态 ...

  7. Notes for "Python in a Nutshell"

    Introduction to Python Wrap C/C++ libraries into Python via Cython and CFFI. Python implementations ...

  8. [八省联考2018]林克卡特树lct

    题解: zhcs的那个题基本上就是抄这个题的,不过背包的分数变成了70分.. 不过得分开来写..因为两个数组不能同时满足 背包的话就是 $f[i][j][0/1]$表示考虑i子树,取j条链,能不能向上 ...

  9. C#代码执行耗时计算,此处是监测的mvc控制器方法

    版权声明:本文为博主原创文章,未经博主允许不得转载. https://blog.csdn.net/u011511086/article/details/78710980using System;usi ...

  10. SpringMVC-2-(Controller)

    一)参数类型 @RequestMapping("hello4") @ResponseBody public ModelAndView Hello4( // Servlet的三个参数 ...