Jeff Janes:

Hi,

As part of our monitoring work for our customers, we stumbled upon an issue with our customers' servers who have a wal_keep_segments setting higher than 0.

We have a monitoring script that checks the number of WAL files in the pg_xlog directory, according to the setting of three parameters (checkpoint_completion_target, checkpoint_segments, and wal_keep_segments). We usually add a percentage to the usual formula:

greatest(
  (2 + checkpoint_completion_target) * checkpoint_segments + 1,
  checkpoint_segments + wal_keep_segments + 1
)

 
I think the first bug is even having this formula in the documentation to start with, and in trying to use it.
 
"and will normally not be more than..."
 
This may be "normal" for a toy system.  I think that the normal state for any system worth monitoring is that it has had load spikes at some point in the past.  
 
So it is the next part of the doc, which describes how many segments it climbs back down to upon recovering from a spike, which is the important one.  And that doesn't mention wal_keep_segments at all, which surely cannot be correct.
 
I will try to independently derive the correct formula from the code, as you did, without looking too much at your derivation  first, and see if we get the same answer.
 
 
Hi,

As part of our monitoring work for our customers, we stumbled upon an issue with our customers' servers who have a wal_keep_segments setting higher than 0.

We have a monitoring script that checks the number of WAL files in the pg_xlog directory, according to the setting of three parameters (checkpoint_completion_target, checkpoint_segments, and wal_keep_segments). We usually add a percentage to the usual formula:

greatest(
  (2 + checkpoint_completion_target) * checkpoint_segments + 1,
  checkpoint_segments + wal_keep_segments + 1
)

 
I think the first bug is even having this formula in the documentation to start with, and in trying to use it.
 
 
I agree. But we have customers asking how to compute the right size for their WAL file system partitions. Right size is usually a euphemism for smallest size, and they usually tend to get it wrong, leading to huge issues. And I'm not even speaking of monitoring, and alerting.

A way to avoid this issue is probably to erase the formula from the documentation, and find a new way to explain them how to size their partitions for WALs.

Monitoring is another matter, and I don't really think a monitoring solution should count the WAL files. What actually really matters is the database availability, and that is covered with having enough disk space in the WALs partition.
 
 
"and will normally not be more than..."
 
This may be "normal" for a toy system.  I think that the normal state for any system worth monitoring is that it has had load spikes at some point in the past.  
 
 
Agreed.
 
 
So it is the next part of the doc, which describes how many segments it climbs back down to upon recovering from a spike, which is the important one.  And that doesn't mention wal_keep_segments at all, which surely cannot be correct.
 
 
Agreed too.
 
 
I will try to independently derive the correct formula from the code, as you did, without looking too much at your derivation  first, and see if we get the same answer.
 
 
Thanks. I look forward reading what you found.
 
What seems clear to me right now is that no one has a sane explanation of the formula. Though yours definitely made sense, it didn't seem to be what the code does.

Josh Berkus:

> Monitoring is another matter, and I don't really think a monitoring 
> solution should count the WAL files. What actually really matters is the 
> database availability, and that is covered with having enough disk space in 
> the WALs partition.

If we don't count the WAL files, though, that eliminates the best way to 
detecting when archiving is failing.

Guillaume Lelarge:

> Monitoring is another matter, and I don't really think a monitoring
> solution should count the WAL files. What actually really matters is the
> database availability, and that is covered with having enough disk space in
> the WALs partition.

If we don't count the WAL files, though, that eliminates the best way to
detecting when archiving is failing.

 
WAL files don't give you this directly. You may think it's an issue to get a lot of WAL files, but it can just be a spike of changes. Counting .ready files makes more sense when you're trying to see if wal archiving is failing. And now, using pg_stat_archiver is the way to go (thanks Gabriele :) ).

Josh Berkus:

>> > If we don't count the WAL files, though, that eliminates the best way to 
>> > detecting when archiving is failing. 
>> > 
>> > 
> WAL files don't give you this directly. You may think it's an issue to get 
> a lot of WAL files, but it can just be a spike of changes. Counting .ready 
> files makes more sense when you're trying to see if wal archiving is 
> failing. And now, using pg_stat_archiver is the way to go (thanks Gabriele 
> :) ).

Yeah, a situation where we can't give our users any kind of reasonable 
monitoring threshold at all sucks though.  Also, it makes it kind of 
hard to allocate a wal partition if it could be 10X the minimum size, 
you know?

What happened to the work Heikki was doing on making transaction log 
disk usage sane?

 
Jeff Janes,Did you find time to work on this? Any news?
 
 
On Wed, Oct 15, 2014 at 1:11 PM, Jeff Janes <[hidden email]> wrote:

On Fri, Aug 8, 2014 at 12:08 AM, Guillaume Lelarge <[hidden email]> wrote:

Hi,

As part of our monitoring work for our customers, we stumbled upon an issue with our customers' servers who have a wal_keep_segments setting higher than 0.

We have a monitoring script that checks the number of WAL files in the pg_xlog directory, according to the setting of three parameters (checkpoint_completion_target, checkpoint_segments, and wal_keep_segments). We usually add a percentage to the usual formula:

greatest(
  (2 + checkpoint_completion_target) * checkpoint_segments + 1,
  checkpoint_segments + wal_keep_segments + 1
)

 
I think the first bug is even having this formula in the documentation to start with, and in trying to use it.
 
"and will normally not be more than..."
 
This may be "normal" for a toy system.  I think that the normal state for any system worth monitoring is that it has had load spikes at some point in the past.  
 
So it is the next part of the doc, which describes how many segments it climbs back down to upon recovering from a spike, which is the important one.  And that doesn't mention wal_keep_segments at all, which surely cannot be correct.
 
I will try to independently derive the correct formula from the code, as you did, without looking too much at your derivation  first, and see if we get the same answer.
 
It looked to me that the formula, when descending from a previously stressed state, would be:
 
greatest(1 + checkpoint_completion_target) * checkpoint_segments, wal_keep_segments) + 1 + 
2 * checkpoint_segments + 1 
 
This assumes logs are filled evenly over a checkpoint cycle, which is probably not true because there is a spike in full page writes right after a checkpoint starts.
 
But I didn't have a great deal of confidence in my analysis.
 
The first line reflects the number of WAL that will be retained as-is, the second is the number that will be recycled for future use before starting to delete them.
 
My reading of the code is that wal_keep_segments is computed from the current end of WAL (i.e the checkpoint record), not from the checkpoint redo point.  If I distribute the part outside the 'greatest' into both branches of the 'greatest', I don't get the same answer as you do for either branch.
 
Then I started wondering if the number we keep for recycling is a good choice, anyway.  2 * checkpoint_segments + 1 seems pretty large.  But then again, given that we've reached the high-water-mark once, how unlikely are we to reach it again?
 
 
> It looked to me that the formula, when descending from a previously stressed 
> state, would be: 

> greatest(1 + checkpoint_completion_target) * checkpoint_segments, 
> wal_keep_segments) + 1 +  
> 2 * checkpoint_segments + 1

I don't think we can assume checkpoint_completion_target is at all 
reliable enough to base a maximum calculation on, assuming anything 
above the maximum is cause of concern and something to inform the admins 
about.

Assuming checkpoint_completion_target is 1 for maximum purposes, how 
about:

max(2 * checkpoint_segments, wal_keep_segments) + 2 * checkpoint_segments + 2

 
 

On Mon, Nov  3, 2014 at 12:39:26PM -0800, Jeff Janes wrote:
> It looked to me that the formula, when descending from a previously stressed
> state, would be:
>
> greatest(1 + checkpoint_completion_target) * checkpoint_segments,
> wal_keep_segments) + 1 + 
> 2 * checkpoint_segments + 1

I don't think we can assume checkpoint_completion_target is at all
reliable enough to base a maximum calculation on, assuming anything
above the maximum is cause of concern and something to inform the admins
about.

Assuming checkpoint_completion_target is 1 for maximum purposes, how
about:

max(2 * checkpoint_segments, wal_keep_segments) + 2 * checkpoint_segments + 2

 
 
Seems something I could agree on. At least, it makes sense, and it works for my customers. Although I'm wondering why "+ 2", and not "+ 1". It seems Jeff and you agree on this, so I may have misunderstood something.
 
 
On Tue, Dec 30, 2014 at 12:35 AM, Guillaume Lelarge <[hidden email]> wrote:

Sorry for my very late answer. It's been a tough month.

2014-11-27 0:00 GMT+01:00 Bruce Momjian <[hidden email]>:

On Mon, Nov  3, 2014 at 12:39:26PM -0800, Jeff Janes wrote:
> It looked to me that the formula, when descending from a previously stressed
> state, would be:
>
> greatest(1 + checkpoint_completion_target) * checkpoint_segments,
> wal_keep_segments) + 1 + 
> 2 * checkpoint_segments + 1

I don't think we can assume checkpoint_completion_target is at all
reliable enough to base a maximum calculation on, assuming anything
above the maximum is cause of concern and something to inform the admins
about.

Assuming checkpoint_completion_target is 1 for maximum purposes, how
about:

max(2 * checkpoint_segments, wal_keep_segments) + 2 * checkpoint_segments + 2

 
 
Seems something I could agree on. At least, it makes sense, and it works for my customers. Although I'm wondering why "+ 2", and not "+ 1". It seems Jeff and you agree on this, so I may have misunderstood something.
 
From hazy memory, one +1 comes from the currently active WAL file, which exists but is not counted towards either wal_keep_segments nor towards recycled files.  And the other +1 comes from the formula for how many recycled files to retain, which explicitly has a +1 in it.
 
 
On Tue, Dec 30, 2014 at 12:35 AM, Guillaume Lelarge <[hidden email]> wrote:

Sorry for my very late answer. It's been a tough month.

2014-11-27 0:00 GMT+01:00 Bruce Momjian <[hidden email]>:

On Mon, Nov  3, 2014 at 12:39:26PM -0800, Jeff Janes wrote:
> It looked to me that the formula, when descending from a previously stressed
> state, would be:
>
> greatest(1 + checkpoint_completion_target) * checkpoint_segments,
> wal_keep_segments) + 1 + 
> 2 * checkpoint_segments + 1

I don't think we can assume checkpoint_completion_target is at all
reliable enough to base a maximum calculation on, assuming anything
above the maximum is cause of concern and something to inform the admins
about.

Assuming checkpoint_completion_target is 1 for maximum purposes, how
about:

max(2 * checkpoint_segments, wal_keep_segments) + 2 * checkpoint_segments + 2

 
 
Seems something I could agree on. At least, it makes sense, and it works for my customers. Although I'm wondering why "+ 2", and not "+ 1". It seems Jeff and you agree on this, so I may have misunderstood something.
 
From hazy memory, one +1 comes from the currently active WAL file, which exists but is not counted towards either wal_keep_segments nor towards recycled files.  And the other +1 comes from the formula for how many recycled files to retain, which explicitly has a +1 in it.
 
OK, that seems much better. Thanks, Jeff.
 
 
> OK, so based on this analysis, what is the right calculation?  This? 

> (1 + checkpoint_completion_target) * checkpoint_segments + 1 + 
> max(wal_keep_segments, checkpoint_segments)
«  []

Now that we have min_wal_size and max_wal_size in 9.5, I don't see any 
value to figuring out the proper formula for backpatching. 

 
 
 参考:
 

注:

  1. 在pg版本9.1 -> 9.4的官方文档中,计算pg_xlog中日志存放数量的方法均为: ( 2 + checkpoint_completion_target ) * checkpoint_segments + 1,但经过上面各位pg大神的讨论是有问题的,更准确的公式应该是:max(2 * checkpoint_segments, wal_keep_segments) + 2 * checkpoint_segments + 2
  2. 另外在pg9.5版本中,新添加了min_wal_size和max_wal_size两个参数,通过max_wal_size和checkpoint_completion_target 参数来控制产生多少个XLOG后触发检查点, 通过min_wal_size和max_wal_size参数来控制哪些XLOG可以循环使用。具体内容参见德哥博客文章
  3. 看到今年淘宝6月的数据库内核月报中也提到了这个问题,他们是由于wal日志过大发现的问题,最终得出的计算公式和上面可以说就是一样的,只是checkpoint_completion_target 没有取值为1而已,公式为:max(wal_keep_segments, checkpoint_segments + checkpoint_segments*checkpoint_completion_target) + 2 * checkpoint_segments + 1 + 1,有兴趣同学的可以看一下。但远没有上面大神争论来的有意思。
 

Maximum number of WAL files in the pg_xlog directory (2)的更多相关文章

  1. Maximum number of WAL files in the pg_xlog directory (1)

      Guillaume Lelarge: Hi, As part of our monitoring work for our customers, we stumbled upon an issue ...

  2. Linux Increase The Maximum Number Of Open Files / File Descriptors (FD)

    How do I increase the maximum number of open files under CentOS Linux? How do I open more file descr ...

  3. the max number of open files 最大打开文件数 ulimit -n RabbitMQ调优

    Installing on RPM-based Linux (RHEL, CentOS, Fedora, openSUSE) — RabbitMQ https://www.rabbitmq.com/i ...

  4. tomcat 大并发报错 Maximum number of threads (200) created for connector with address null and port 8080

    1.INFO: Maximum number of threads (200) created for connector with address null and port 8091 说明:最大线 ...

  5. tomcat 大并发报错 Maximum number of threads (200) created for connector with address null and port 80

    1.INFO: Maximum number of threads (200) created for connector with address null and port 80 说明:最大线程数 ...

  6. The maximum number of processes for the user account running is currently , which can cause performance issues. We recommend increasing this to at least 4096.

    [root@localhost ~]# vi /etc/security/limits.conf # /etc/security/limits.conf # #Each line describes ...

  7. ORA-00020: maximum number of processes (40) exceeded模拟会话连接数满

    问题描述:在正式生产环境中,有的库建的process和session连接数目设置的较小,导致后期满了无法连接.因为正式库无法进行停库修改,只能释放连接,做个测试模拟 1. 修改现有最大会话与进程连接数 ...

  8. iOS---The maximum number of apps for free development profiles has been reached.

    真机调试免费App ID出现的问题The maximum number of apps for free development profiles has been reached.免费应用程序调试最 ...

  9. [LeetCode] Third Maximum Number 第三大的数

    Given a non-empty array of integers, return the third maximum number in this array. If it does not e ...

随机推荐

  1. LIST 和 MAP

    Collection和Map LIST 集合 arraylist arraylist源代码: 1.ArrayList 底层采用数组实现,当使用不带参数的构造方法生成 ArrayList 对象时,实际上 ...

  2. cometd的js端代码

    一:js端使用方式 CometD JavaScript的配置.整个API可以通过一个单一的原型名为org.cometd.Cometd的对象来调用.Dojo工具包中有一个名称为dojox.cometd的 ...

  3. JS对于Android和IOS平台的点击响应的适配

    IOS点击事件 Click 300毫秒点击延迟 解决办法: 参考:http://cuiqingcai.com/1687.html 可判断设备 if (/(iPhone|iPad|iPod|iOS)/i ...

  4. lower_bound和upper_bound算法

    参考:http://www.cnblogs.com/cobbliu/archive/2012/05/21/2512249.html ForwardIter lower_bound(ForwardIte ...

  5. 12、SQL基础整理(运算符与优先级)

    运算符 + - * / %(取余),赋值运算符 = declare @jia int set @jia = 1+1 print @jia declare @jia int set @jia = 10% ...

  6. timeZoneGetter

    function timeZoneGetter(date) { // getTimezoneOffset 返回格林威治时间和本地时间之间的时差,以分钟为单位 var zone = -1 * date. ...

  7. 继承自CCObject的对象成员变量出错或者为空的问题

    写了个类想让其作为某种数据集合,还可以自动销毁,所以就直接继承了最底层的CCObject,所以并不属于视图,也就不会被addChild到显示列表里,于是就造成了接下来遇到的一个情况:其所有的成员变量被 ...

  8. 整合Open vSwitch与DNSmasq为虚拟机提供DHCP功能

    继上文<Ubuntu14.04安装配置Open vSwitch>安装好Open vSwitch后,这里我们将要创建两个KVM虚拟机,并通过DNSmasq来为这两个虚拟机自动分配私网IP地址 ...

  9. 二维数组的传输 (host <-> device)

    前言 本文的目的很明确:介绍如何将二维数组传递进显存,以及如何将二维数组从显存传递回主机端. 实现步骤 1. 在显存中为二维数组开辟空间 2. 获取该二维数组在显存中的 pitch 值 (cudaMa ...

  10. html5的发展历程

    20年磨一剑,HTML5作为下一代Web标准,她的语义之美.人性之美.简单之美.实用之美……如同一场革命,将Web从内容平台推向标准化的应用平台,并一统各在平台阵营的标准.2008年,HTML5发布首 ...