Postgresql max_wal_size与Xlog file

​ Pg后台进程在执行用户事务时, 发生的数据更改是先写入缓冲池中, 对应PG就是shared buffers, PG缓冲池一般设置为内存的1/4左右, 缓冲池里边的这些数据修改,在事务提交时,无需同步到磁盘。 因为在事务提交时,会先写入WAL日志, 有wal日志存在,就可以在异常情况下将数据恢复, 保障数据库安全。因此数据本身是否在提交时写入磁盘将没有那么重要。Pg只是在需要时候, 例如脏页较多或者一定时间间隔后, 才将数据写回磁盘。

​ checkPoint会触发刷新xlog日志页到磁盘.

​ checkPoint称之为检查点, 一般checkpoint会将某个时间之前的脏数据全部刷新到磁盘, 以实现数据的一致性与完整性。

CheckPointer进程解析

CheckPoint触发条件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
/*
* OR-able request flag bits for checkpoints. The "cause" bits are used only
* for logging purposes. Note: the flags must be defined so that it's
* sensible to OR together request flags arising from different requestors.
* OR-able 请求检查点的标志位
*/

/* These directly affect the behavior of CreateCheckPoint and subsidiaries */
#define CHECKPOINT_IS_SHUTDOWN 0x0001 /* Checkpoint is for shutdown */
#define CHECKPOINT_END_OF_RECOVERY 0x0002 /* Like shutdown checkpoint, but
* issued at end of WAL recovery */
#define CHECKPOINT_IMMEDIATE 0x0004 /* Do it without delays */ //finish the checkpoint ASAP, ignoring checkpoint_completion_target_parameter
#define CHECKPOINT_FORCE 0x0008 /* Force even if no activity */ //force a checkpoint even if no XLOG activity has occurred since the last one (implied by CHECKPOINT_IS_SHUTDOWN or CHECKPOINT_END_OF_RECOVERY)
#define CHECKPOINT_FLUSH_ALL 0x0010 /* Flush all pages, including those
* belonging to unlogged tables */
/* These are important to RequestCheckpoint */
#define CHECKPOINT_WAIT 0x0020 /* Wait for completion */
/* These indicate the cause of a checkpoint request */
#define CHECKPOINT_CAUSE_XLOG 0x0040 /* XLOG consumption */ // checkpoint is requested due to xlog filling. (This affects logging, and in particular enables CheckPointWarning.)
#define CHECKPOINT_CAUSE_TIME 0x0080 /* Elapsed time */

以上几种情况分别对应:

  1. 数据库shutdown操作
  2. 数据库recovery完成
  3. 管理员强制执行 checkpoint [CHECKPINT_FORCE]
  4. xlog日志量触发checkpoint阈值
  5. 周期进行checkpoint [ Elapsed time >= CHeckPointTimeOut ]
  6. 需要刷新所有脏页

辅助性子进程checkpoint,会不断周期性检查以及xlog阈值是否达到. 而周期时间 与xlog日志量的阈值通过参数 max_wal_size 与 checkpoint_completion_target设置.

CheckPoint相关参数

  • CheckPointSegments

    • WAL File的数目. 由checkpoint_completion_targetmax_wal_size 计算;
  • checkpoint_timeout

    • 系统自动执行checkpoint之间的最大时间间隔。系统默认值是5分钟。
  • checkpoint_completion_target

    • 该参数表示checkpoint的完成时间占两次checkpoint时间间隔的比例,系统默认值是0.5,也就是说每个checkpoint需要在checkpoints间隔时间的50%内完成。
  • checkpoint_warning

    • 系统默认值是30秒,如果checkpoints的实际发生间隔小于该参数,将会在server log中写入写入一条相关信息。可以通过设置为0禁用。

max_wal_size 与 wal segment file

xlog.c 默认值. 判断wal file size; 根据以下参数计算;

max_wal_size = 1GB
min_wal_size = 80MB

  1. GUC参数最大值. 最小值设置;

    1
    if (newval->realval < conf->min || newval->realval > conf->max)
  2. checkpoint_completion_target 在检查点期间用于清空脏缓冲区的时间,作为检查点间隔的一部分

    1
    2
    3
    4
    5
    6
    7
    8
    9
    {
    {"checkpoint_completion_target", PGC_SIGHUP, WAL_CHECKPOINTS,
    gettext_noop("Time spent flushing dirty buffers during checkpoint, as fraction of checkpoint interval."),
    NULL
    },
    &CheckPointCompletionTarget,
    0.5, 0.0, 1.0,
    NULL, NULL, NULL
    },

    checkpoint_completion_target 默认取值范围 (0, 1);

  3. max_wal_size

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    {
    {"max_wal_size", PGC_SIGHUP, WAL_CHECKPOINTS,
    gettext_noop("Sets the WAL size that triggers a checkpoint."),
    NULL,
    GUC_UNIT_MB
    },
    &max_wal_size_mb,
    64 * (XLOG_SEG_SIZE / (1024 * 1024)), 2, MAX_KILOBYTES,
    NULL, assign_max_wal_size, NULL
    },

    man_wal_size:

    ​ boot_val: 64 * (XLOG_SEG_SIZE / (1024 * 1024))

    ​ 最小值: 2

    ​ 最大值: MAX_KILOBYTES

    1
    2
    3
    4
    5
    #if SIZEOF_SIZE_T > 4 && SIZEOF_LONG > 4
    #define MAX_KILOBYTES INT_MAX
    #else
    #define MAX_KILOBYTES (INT_MAX / 1024)
    #endif

    max_wal_size:

    assign_hook:assign_max_wal_size

  4. checkpoint 涉及最大段数: 计算方式

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    /*
    * Max distance from last checkpoint, before triggering a new xlog-based
    * checkpoint.
    * 触发新的xlog检查点之前, 距离上一个检查点的最大距离; wal file size;
    */
    int CheckPointSegments;

    void
    assign_max_wal_size(int newval, void *extra)
    {
    max_wal_size_mb = newval;
    CalculateCheckpointSegments();
    }

    static void
    CalculateCheckpointSegments(void)
    {
    double target;
    //CheckPointCompletionTarget 默认取值0.5 取值范围(0, 1);
    target = (double) ConvertToXSegs(max_wal_size_mb) / (2.0 + CheckPointCompletionTarget);

    CheckPointSegments = (int) target;

    if (CheckPointSegments < 1)
    CheckPointSegments = 1;
    }

    /* Convert min_wal_size_mb and max wal_size_mb to equivalent segment count */
    #define ConvertToXSegs(x) \
    (x / (XLOG_SEG_SIZE / (1024 * 1024)))

    计算得知 CheckPointSegments 取值范围 (1, max_wal_size_mb * (1/3 - 1/2));

  5. 关于 wal file size 以及 段文件替换

    XLogFileInit

    1
    2
    3
    4
    max_segno = logsegno + CheckPointSegments;
    if (!InstallXLogFileSegment(&installed_segno, tmppath,
    *use_existent, max_segno,
    use_lock))

    InstallXLogFileSegment

    Install a new XLOG segment file as a current or future log segment.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    /* Find a free slot to put it in */
    while (stat(path, &stat_buf) == 0)
    {
    if ((*segno) >= max_segno)
    {
    /* Failed to find a free slot within specified range */
    if (use_lock)
    LWLockRelease(ControlFileLock);
    return false;
    }
    (*segno)++;
    XLogFilePath(path, ThisTimeLineID, *segno);
    }

移除XLOG

经常在做恢复的时候发现有的xlog file无法找到. 被覆盖使用. 那么

xlog文件什么时候删除?

删除多少,保留多少xlog文件?

都有哪些xlog文件需要保留?

  1. 需要首先估算两次checkpoint之间的xlog量。 计算最大的日志文件号 从而回收不需要的文件,并且进行重命名,提供即将使用的;

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    /*
    * Update the estimate of distance between checkpoints.
    *
    * The estimate is used to calculate the number of WAL segments to keep
    * preallocated, see XLOGFileSlop().
    */
    static void
    UpdateCheckPointDistanceEstimate(uint64 nbytes)
    {
    /*
    * To estimate the number of segments consumed between checkpoints, keep a
    * moving average of the amount of WAL generated in previous checkpoint
    * cycles. However, if the load is bursty, with quiet periods and busy
    * periods, we want to cater for the peak load. So instead of a plain
    * moving average, let the average decline slowly if the previous cycle
    * used less WAL than estimated, but bump it up immediately if it used
    * more.
    *
    * When checkpoints are triggered by max_wal_size, this should converge to
    * CheckpointSegments * XLOG_SEG_SIZE,
    *
    * Note: This doesn't pay any attention to what caused the checkpoint.
    * Checkpoints triggered manually with CHECKPOINT command, or by e.g.
    * starting a base backup, are counted the same as those created
    * automatically. The slow-decline will largely mask them out, if they are
    * not frequent. If they are frequent, it seems reasonable to count them
    * in as any others; if you issue a manual checkpoint every 5 minutes and
    * never let a timed checkpoint happen, it makes sense to base the
    * preallocation on that 5 minute interval rather than whatever
    * checkpoint_timeout is set to.
    */
    PrevCheckPointDistance = nbytes;
    if (CheckPointDistanceEstimate < nbytes) //更新估算量;
    CheckPointDistanceEstimate = nbytes;
    else
    CheckPointDistanceEstimate =
    (0.90 * CheckPointDistanceEstimate + 0.10 * (double) nbytes);
    }
  2. 计算上一次checkpoint时所在的文件段号: 根据KeepLogSeg 确定保留的logSegNo

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    /*
    * Compute a segment number from an XLogRecPtr.
    *
    * For XLByteToSeg, do the computation at face value. For XLByteToPrevSeg,
    * a boundary byte is taken to be in the previous segment. This is suitable
    * for deciding which segment to write given a pointer to a record end,
    * for example.
    */
    #define XLByteToSeg(xlrp, logSegNo) \
    logSegNo = (xlrp) / XLogSegSize

    #define XLByteToPrevSeg(xlrp, logSegNo) \
    logSegNo = ((xlrp) - 1) / XLogSegSize

    KeepLogSeg

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    /*
    * Retreat *logSegNo to the last segment that we need to retain because of
    * either wal_keep_segments or replication slots.
    *
    * This is calculated by subtracting wal_keep_segments from the given xlog
    * location, recptr and by making sure that that result is below the
    * requirement of replication slots.
    */
    static void
    KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
    {
    XLogSegNo segno;
    XLogRecPtr keep;

    XLByteToSeg(recptr, segno);
    keep = XLogGetReplicationSlotMinimumLSN(); //获取备机请求的lSN; 防止备机某些请求, 但删除了LOGSEG;

    /* compute limit for wal_keep_segments first */
    if (wal_keep_segments > 0)
    {
    /* avoid underflow, don't go below 1 */
    if (segno <= wal_keep_segments)
    segno = 1;
    else
    segno = segno - wal_keep_segments;
    }

    /* then check whether slots limit removal further */
    if (max_replication_slots > 0 && keep != InvalidXLogRecPtr)
    {
    XLogSegNo slotSegNo;

    XLByteToSeg(keep, slotSegNo); //获取备机请求的LSN所在LogSegNo;

    if (slotSegNo <= 0)
    segno = 1;
    else if (slotSegNo < segno)
    segno = slotSegNo;
    }

    /* don't delete WAL segments newer than the calculated segment */
    if (segno < *logSegNo)
    *logSegNo = segno;
    }

    1568796985953

  1. RemoveXlogFile 利用得到的logsegno, 回收之前的wal file;

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    XLogSegNo	endlogSegNo;
    XLogSegNo recycleSegNo;

    /*
    * Initialize info about where to try to recycle to.
    */
    XLByteToSeg(endptr, endlogSegNo); //获取保留的文件段号;
    if (PriorRedoPtr == InvalidXLogRecPtr)
    recycleSegNo = endlogSegNo + 10;
    else
    recycleSegNo = XLOGfileslop(PriorRedoPtr);

    snprintf(path, MAXPGPATH, XLOGDIR "/%s", segname);

    /*
    * Before deleting the file, see if it can be recycled as a future log
    * segment. Only recycle normal files, pg_standby for example can create
    * symbolic links pointing to a separate archive directory.
    */
    if (endlogSegNo <= recycleSegNo &&
    lstat(path, &statbuf) == 0 && S_ISREG(statbuf.st_mode) &&
    InstallXLogFileSegment(&endlogSegNo, path,
    true, recycleSegNo, true)) //进行walfile的安装;
    {
    ereport(DEBUG2,
    (errmsg("recycled write-ahead log file \"%s\"",
    segname)));
    CheckpointStats.ckpt_segs_recycled++;
    /* Needn't recheck that slot on future iterations */
    endlogSegNo++;
    }

    ....

    XLogArchiveCleanup(segname);

    关于回收recycleSegNo值;

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    /*
    * At a checkpoint, how many WAL segments to recycle as preallocated future
    * XLOG segments? Returns the highest segment that should be preallocated.
    */
    static XLogSegNo
    XLOGfileslop(XLogRecPtr PriorRedoPtr)
    {
    XLogSegNo minSegNo;
    XLogSegNo maxSegNo;
    double distance;
    XLogSegNo recycleSegNo;

    /*
    * Calculate the segment numbers that min_wal_size_mb and max_wal_size_mb
    * correspond to. Always recycle enough segments to meet the minimum, and
    * remove enough segments to stay below the maximum.
    */
    minSegNo = PriorRedoPtr / XLOG_SEG_SIZE + ConvertToXSegs(min_wal_size_mb) - 1;
    maxSegNo = PriorRedoPtr / XLOG_SEG_SIZE + ConvertToXSegs(max_wal_size_mb) - 1;

    /*
    * Between those limits, recycle enough segments to get us through to the
    * estimated end of next checkpoint.
    *
    * To estimate where the next checkpoint will finish, assume that the
    * system runs steadily consuming CheckPointDistanceEstimate bytes between
    * every checkpoint.
    *
    * The reason this calculation is done from the prior checkpoint, not the
    * one that just finished, is that this behaves better if some checkpoint
    * cycles are abnormally short, like if you perform a manual checkpoint
    * right after a timed one. The manual checkpoint will make almost a full
    * cycle's worth of WAL segments available for recycling, because the
    * segments from the prior's prior, fully-sized checkpoint cycle are no
    * longer needed. However, the next checkpoint will make only few segments
    * available for recycling, the ones generated between the timed
    * checkpoint and the manual one right after that. If at the manual
    * checkpoint we only retained enough segments to get us to the next timed
    * one, and removed the rest, then at the next checkpoint we would not
    * have enough segments around for recycling, to get us to the checkpoint
    * after that. Basing the calculations on the distance from the prior redo
    * pointer largely fixes that problem.
    */
    //CheckPointCompletionTarget 默认取值 0.5 取值范围 (0, 1);
    //CheckPointDistanceEstimate 两次xlog之间的评估量;
    distance = (2.0 + CheckPointCompletionTarget) * CheckPointDistanceEstimate;
    /* add 10% for good measure. */
    distance *= 1.10;

    recycleSegNo = (XLogSegNo) ceil(((double) PriorRedoPtr + distance) / XLOG_SEG_SIZE);

    if (recycleSegNo < minSegNo)
    recycleSegNo = minSegNo;
    if (recycleSegNo > maxSegNo)
    recycleSegNo = maxSegNo;

    return recycleSegNo;
    }
欣赏此文? 求鼓励,求支持!