Hi,

As part of our monitoring work for our customers, we stumbled upon an issue with our customers' servers who have a wal_keep_segments setting higher than 0.

We have a monitoring script that checks the number of WAL files in the pg_xlog directory, according to the setting of three parameters (checkpoint_completion_target, checkpoint_segments, and wal_keep_segments). We usually add a percentage to the usual formula:

greatest(
  (2 + checkpoint_completion_target) * checkpoint_segments + 1,
  checkpoint_segments + wal_keep_segments + 1
)

And we have lots of alerts from the script for customers who set their wal_keep_segments setting higher than 0.

So we started to question this sentence of the documentation:

There will always be at least one WAL segment file, and will normally not be more than (2 + checkpoint_completion_target) * checkpoint_segments + 1 or checkpoint_segments + wal_keep_segments + 1 files.

(http://www.postgresql.org/docs/9.3/static/wal-configuration.html)

While doing some tests, it appears it would be more something like:

wal_keep_segments + (2 + checkpoint_completion_target) * checkpoint_segments + 1

But after reading the source code (src/backend/access/transam/xlog.c), the right formula seems to be:

wal_keep_segments + 2 * checkpoint_segments + 1

Here is how we went to this formula...

CreateCheckPoint(..) is responsible, among other things, for deleting and recycling old WAL files. From src/backend/access/transam/xlog.c, master branch, line 8363:

/*
* Delete old log files (those no longer needed even for previous
* checkpoint or the standbys in XLOG streaming).
*/
if (_logSegNo)
{
KeepLogSeg(recptr, &_logSegNo);
_logSegNo--;
RemoveOldXlogFiles(_logSegNo, recptr);
}

KeepLogSeg(...) function takes care of wal_keep_segments. From src/backend/access/transam/xlog.c, master branch, line 8792:

/* compute limit for wal_keep_segments first */
if (wal_keep_segments > )
{
/* avoid underflow, don't go below 1 */
if (segno <= wal_keep_segments)
segno = ;
else
segno = segno - wal_keep_segments;
}

IOW, the segment number (segno) is decremented according to the setting of wal_keep_segments. segno is then sent back to CreateCheckPoint(...) via _logSegNo. The RemoveOldXlogFiles() gets this segment number so that it can remove or recycle all files before this segment number. This function gets the number of WAL files to recycle with the XLOGfileslop constant, which is defined as:

/*
* XLOGfileslop is the maximum number of preallocated future XLOG segments.
* When we are done with an old XLOG segment file, we will recycle it as a
* future XLOG segment as long as there aren't already XLOGfileslop future
* segments; else we'll delete it. This could be made a separate GUC
* variable, but at present I think it's sufficient to hardwire it as
* 2*CheckPointSegments+1. Under normal conditions, a checkpoint will free
* no more than 2*CheckPointSegments log segments, and we want to recycle all
* of them; the +1 allows boundary cases to happen without wasting a
* delete/create-segment cycle.
*/
#define XLOGfileslop (2*CheckPointSegments + 1)

(in src/backend/access/transam/xlog.c, master branch, line 100)

IOW, PostgreSQL will keep wal_keep_segments WAL files before the current WAL file, and then there may be 2*CheckPointSegments + 1 recycled ones. Hence the formula:

wal_keep_segments + 2 * checkpoint_segments + 1

And this is what we usually find in our customers' servers. We may find more WAL files, depending on the write activity of the cluster, but in average, we get this number of WAL files.

AFAICT, the documentation is wrong about the usual number of WAL files in the pg_xlog directory. But I may be wrong, in which case, the documentation isn't clear enough for me, and should be fixed so that others can't misinterpret it like I may have done.

Any comments? did I miss something, or should we fix the documentation?

Thanks.

Bruce Momjian:

I looked into this, and came up with more questions.  Why is 
checkpoint_completion_target involved in the total number of WAL 
segments?  If checkpoint_completion_target is 0.5 (the default), the 
calculation is:

(2 + 0.5) * checkpoint_segments + 1

while if it is 0.9, it is:

(2 + 0.9) * checkpoint_segments + 1

Is this trying to estimate how many WAL files are going to be created 
during the checkpoint?  If so, wouldn't it be (1 + 
checkpoint_completion_target), not "2 +".  My logic is you have the old 
WAL files being checkpointed (that's the "1"), plus you have new WAL 
files being created during the checkpoint, which would be 
checkpoint_completion_target * checkpoint_segments, plus one for the 
current WAL file.

The original calculation is summarized in this email:

http://www.postgresql.org/message-id/AANLkTi=e=oR54OuxAw88=dtV4wt0e5edMiGaeZtBVcKO@...

However, in my reading of this, it appears to be double-counting the WAL 
files during the checkpoint, e.g. the checkpoint_completion_target * 
checkpoint_segments WAL files are also part of the later 
checkpoint_segments number.

I also don't see how that can be equivalent to:

checkpoint_segments + wal_keep_segments + 1

because wal_keep_segments isn't used in the first calculation.  Is the 
user supposed to compute the maximum of those two?  Seems easier to just 
give one expression.

Is the right answer:

max(checkpoint_segments, wal_keep_segments) + checkpoint_segments + 1

or, if you want to use checkpoint_completion_target, it would be:

max(checkpoint_segments * checkpoint_completion_target, wal_keep_segments) + checkpoint_segments + 1

Is checkpoint_completion_target accurate enough to define a maximum 
number of files?

I think I need Masao Fujii's comments on this.  The fact the user is 
seeing something different from what is documented means something 
probably needs updating.

Jeff Janes:

I looked into this, and came up with more questions.  Why is
checkpoint_completion_target involved in the total number of WAL
segments?  If checkpoint_completion_target is 0.5 (the default), the
calculation is:

(2 + 0.5) * checkpoint_segments + 1

while if it is 0.9, it is:

(2 + 0.9) * checkpoint_segments + 1

Is this trying to estimate how many WAL files are going to be created
during the checkpoint?  If so, wouldn't it be (1 +
checkpoint_completion_target), not "2 +".  My logic is you have the old
WAL files being checkpointed (that's the "1"), plus you have new WAL
files being created during the checkpoint, which would be
checkpoint_completion_target * checkpoint_segments, plus one for the
current WAL file.

 
WAL is not eligible to be recycled until there have been 2 successful checkpoints.
 
So at the end of a checkpoint, you have 1 cycle of WAL which has just become eligible for recycling,
1 cycle of WAL which is now expendable but which is kept anyway, and checkpoint_completion_target worth of WAL which has occurred while the checkpoint was occurring and is still needed for crash recovery.
 
I don't really understand the point of this way of doing things.  I guess it is because the control file contains two redo pointers, one for the last checkpoint, and one for the previous to that checkpoint, and if recovery finds that it can't use the most recent one it tries the ones before that.  Why?  Beats me.  If we are worried about the control file getting a corrupt redo pointer, it seems like we would record the last one twice, rather than recording two different ones once each.  And if the in-memory version got corrupted before being written to the file, I really doubt anything is going to save your bacon at that point.
 
I've never seen a case where recovery couldn't use the last recorded good checkpoint, so instead used the previous one, and was successful at it.  But then again I haven't seen all possible crashes.
 
This is based on memory from the last time I looked into this, I haven't re-verified it so could be wrong or obsolete.
 
 
 
> WAL is not eligible to be recycled until there have been 2 successful 
> checkpoints. 

> So at the end of a checkpoint, you have 1 cycle of WAL which has just become 
> eligible for recycling, 
> 1 cycle of WAL which is now expendable but which is kept anyway, and 
> checkpoint_completion_target worth of WAL which has occurred while the 
> checkpoint was occurring and is still needed for crash recovery.
«  []

OK, so based on this analysis, what is the right calculation?  This?

(1 + checkpoint_completion_target) * checkpoint_segments + 1 + 
        max(wal_keep_segments, checkpoint_segments) 

 
 
 
> AFAICT, the documentation is wrong about the usual number of WAL files in 
> the pg_xlog directory. But I may be wrong, in which case, the documentation 
> isn't clear enough for me, and should be fixed so that others can't 
> misinterpret it like I may have done. 

> Any comments? did I miss something, or should we fix the documentation?
«  []

I think you're right. The correct formula of the number of WAL files in 
pg_xlog seems to be

(3 + checkpoint_completion_target) * checkpoint_segments + 1

or

wal_keep_segments + 2 * checkpoint_segments + 1

Why? At the end of checkpoint, the WAL files which were generated since the 
start of previous checkpoint cannot be removed and must remain in pg_xlog. 
The number of them is

(1 + checkpoint_completion_target) * checkpoint_segments

or

wal_keep_segments

Also, at the end of checkpoint, as you pointed out, if there are 
*many* enough old WAL files, 2 * checkpoint_segments + 1 WAL files will be 
recycled. Then checkpoint_segments WAL files will be consumed till the end of 
next checkpoint. But since there are already 2 * checkpoint_segments + 1 
recycled WAL files, no more files are increased. So, WAL files that we cannot 
remove and can recycle at the end of checkpoint can exist in pg_xlog, and the 
num of them can be calculated by the above formula.

If my understanding is right, we need to change the formula at the document.

 
 
 
 
参考:

Maximum number of WAL files in the pg_xlog directory (1)的更多相关文章

  1. Maximum number of WAL files in the pg_xlog directory (2)

    Jeff Janes: Hi, As part of our monitoring work for our customers, we stumbled upon an issue with our ...

  2. Linux Increase The Maximum Number Of Open Files / File Descriptors (FD)

    How do I increase the maximum number of open files under CentOS Linux? How do I open more file descr ...

  3. the max number of open files 最大打开文件数 ulimit -n RabbitMQ调优

    Installing on RPM-based Linux (RHEL, CentOS, Fedora, openSUSE) — RabbitMQ https://www.rabbitmq.com/i ...

  4. tomcat 大并发报错 Maximum number of threads (200) created for connector with address null and port 8080

    1.INFO: Maximum number of threads (200) created for connector with address null and port 8091 说明:最大线 ...

  5. tomcat 大并发报错 Maximum number of threads (200) created for connector with address null and port 80

    1.INFO: Maximum number of threads (200) created for connector with address null and port 80 说明:最大线程数 ...

  6. The maximum number of processes for the user account running is currently , which can cause performance issues. We recommend increasing this to at least 4096.

    [root@localhost ~]# vi /etc/security/limits.conf # /etc/security/limits.conf # #Each line describes ...

  7. ORA-00020: maximum number of processes (40) exceeded模拟会话连接数满

    问题描述:在正式生产环境中,有的库建的process和session连接数目设置的较小,导致后期满了无法连接.因为正式库无法进行停库修改,只能释放连接,做个测试模拟 1. 修改现有最大会话与进程连接数 ...

  8. iOS---The maximum number of apps for free development profiles has been reached.

    真机调试免费App ID出现的问题The maximum number of apps for free development profiles has been reached.免费应用程序调试最 ...

  9. [LeetCode] Third Maximum Number 第三大的数

    Given a non-empty array of integers, return the third maximum number in this array. If it does not e ...

随机推荐

  1. HDU 1045 - Fire Net (最大独立集)

    题意:给你一个正方形棋盘.每个棋子可以直线攻击,除非隔着石头.现在要求所有棋子都不互相攻击,问最多可以放多少个棋子. 这个题可以用搜索来做.每个棋子考虑放与不放两种情况,然后再判断是否能互相攻击来剪枝 ...

  2. ios上架

    1.登录developer.apple.com 2.点击member center后 进下图 3.点击certificates Identifiers进下图 4.点击Certificates进下图,首 ...

  3. 预编译 .pch文件

    如果工程导入了其他编程语言文件混编的时候   .pch文件会在程序开始的时候导入所有头文件,需要 '预编写命令' 区分编程语言的头文件. 下面报错就是因为工程导入了.c 文件   .pch全局导入了O ...

  4. Oracle 获取用户表的字段定义

    获取用户表列表: select * from user_tables; select * from all_tables; select * from dba_tables; 获取表的字段: sele ...

  5. (spring-第14回【IoC基础篇】)国际化信息

    国际化又称为本地化. 当你把手机的language由中文切换到英文时,你的微信也相应改用英语,这就是i18n国际化.一般来说,应用软件提供一套不同语言的资源文件,放到特定目录中,应用根据不同语言的操作 ...

  6. typedef的用法

    我最开始学习的是C++,而不是C语言.虽说C++涵盖了C,但是C++的语法更加方便,比如输入输出……但是为了与C兼容,常常需要保留C语言的用法,这就比较烦人了,因为我们都希望有一个固定的语法. 首先让 ...

  7. bootstrap菜单完美解决---原创

    由于bootstrap的各方优点,偶的“点金项目细化分包管理平台”决定采用它.但在使用中遇到了一些问题,比如菜单的问题,这个菜单是用的一个JQuery的一个效果,点击后,所点击的链接处的class要加 ...

  8. WinFrm窗体的传值方式

    比较简单的方法: 一:1.定义两个窗体 2.在父窗体中加入子窗体的属性 public ChildFrm ChildFrm { get; set; } 3.加载的时候: private void Par ...

  9. ERP仓库管理系统查询(十)

    需求:    1.根据仓库编号,获取仓库信息绑定至页面相关控件. 2.根据仓库编号,获取管理员信息绑定到页面相关控件 修改的界面: <%@ Page Language="C#" ...

  10. iOS开发:JavaScriptCore.framework的简单使用--JS与OC的交互篇

    iOS7之后苹果为众猿推出了JavaScriptCore.framework这个框架,这个框架为大家在与JS交互上提供了很大帮助,可以在html界面上调用OC方法并传参,也可以在OC上调用JS方法并传 ...