pg_stat_statements源代码分析

磨砺技术珠矶，践行数据之道，追求卓越价值

回到上一级页面：PostgreSQL内部结构与源代码研究索引页回到顶级页面：PostgreSQL索引页

pg_stat_statement的源代码，非常地有示范意义。其中使用了各种hook，同时又定义为extension。

先看初始化是如何发生的：

在postgresql.conf里，如果有 preload_shared_libraries='pg_stat_statements'，那么成功启动的时候，会提示：

[root@server ~]# su - postgres

[postgres@server ~]$ cd /usr/local/pgsql

[postgres@server pgsql]$ ./bin/pg_ctl -D ./data start

server starting

[postgres@server pgsql]$ LOG:  loaded library "pg_stat_statements"

LOG:  database system was shut down at -- :: CST

LOG:  autovacuum launcher started

LOG:  database system is ready to accept connections

在这个时刻(具体说是loaded library "pg_stat_statements"信息提示之前，会执行 pg_stat_statements的_PG_init函数完成初始化)：初始化过程中准备好了各式hook。

/*

 * Module load callback

 */

void

_PG_init(void)

{

    …

    /*

     * Install hooks.

     */

    prev_shmem_startup_hook = shmem_startup_hook;

    shmem_startup_hook = pgss_shmem_startup;                        

    prev_ExecutorStart = ExecutorStart_hook;

    ExecutorStart_hook = pgss_ExecutorStart;                        

    prev_ExecutorRun = ExecutorRun_hook;

    ExecutorRun_hook = pgss_ExecutorRun;                        

    prev_ExecutorFinish = ExecutorFinish_hook;

    ExecutorFinish_hook = pgss_ExecutorFinish;                        

    prev_ExecutorEnd = ExecutorEnd_hook;

    ExecutorEnd_hook = pgss_ExecutorEnd;                        

    prev_ProcessUtility = ProcessUtility_hook;

    ProcessUtility_hook = pgss_ProcessUtility;

}

从整体上来看，画一个图来描述，从执行的角度而言，加挂了hook之后，在postmaster是这样的：

在上图中，Postmaster进程启动之后，当发现有shmem_startup_hook的时候，会去执行此hook函数，这里挂的是

pgss_shmem_startup函数，故此函数被执行，然后返回。

在pgss_shmem_startup中，在shared memory中，建立一个hashtable，由pgss_hash指针来指向。此后，postmaster的各子进程，可以通过此pgss_hash指针，来使用此hashtable存取sql语句执行的信息。

再来看sql问执行时，发生了什么：

对于处理用户请求的，Postmaster的各子进程，加挂了hook后，当用户执行一条SELETE/INSERT/UPDATE/DELETE的SQ文的时候，执行计划确定后，执行过程是这样的：

而对于 SELETE/INSERT/UPDATE/DELETE 之外的语句(Utility Command:例:create table)，执行过程是这样的：

从代码上，可以比较清楚地看到pgss_ProcessUtility是如何发生的：

void

ProcessUtility(Node *parsetree, const char *queryString,ParamListInfo params,  bool isTopLevel,

                           DestReceiver *dest, char *completionTag)

{                                

        Assert(queryString != NULL);    /* required as of 8.4 */                                

        /*

         * We provide a function hook variable that lets loadable plugins get

         * control when ProcessUtility is called.  Such a plugin would normally

         * call standard_ProcessUtility().

         */

        if (ProcessUtility_hook)

                (*ProcessUtility_hook) (parsetree, queryString, params,

                                                                isTopLevel, dest, completionTag);

        else

                standard_ProcessUtility(parsetree, queryString, params,

                                                                isTopLevel, dest, completionTag);

}

其余的hook发生过程都与此类似。

那么sql文执行的数据，是如何收集的呢？看如下代码的概要：

执行到pgss_ExecutorEnd的时候，调用了pgss_store来存储sql运行信息到共享内存的hash表里：

/*

 * ExecutorEnd hook: store results if needed

 */

static void

pgss_ExecutorEnd(QueryDesc *queryDesc)

{

    if (queryDesc->totaltime && pgss_enabled())

    {

        /*

         * Make sure stats accumulation is done.  (Note: it's okay if several

         * levels of hook all do this.)

         */

        InstrEndLoop(queryDesc->totaltime);                            

        pgss_store(queryDesc->sourceText,queryDesc->totaltime->total,

                   queryDesc->estate->es_processed,  &queryDesc->totaltime->bufusage); 

    }                                

    if (prev_ExecutorEnd)

        prev_ExecutorEnd(queryDesc);

    else

        standard_ExecutorEnd(queryDesc);

}

而pgss_store函数的概要，大致如下：

/*

 * Store some statistics for a statement.

 */

static void

pgss_store(const char *query, double total_time, uint64 rows,

           const BufferUsage *bufusage)

{

    pgssHashKey         key;

    double        usage;

    pgssEntry  *entry;                                

    Assert(query != NULL);                                

    /* Safety check... */

    if (!pgss || !pgss_hash)

        return;                            

    /* Set up key for hashtable search */

    key.userid = GetUserId();

    key.dbid = MyDatabaseId;

    key.encoding = GetDatabaseEncoding();

    key.query_len = strlen(query);                                

    if (key.query_len >= pgss->query_size)

        key.query_len = pg_encoding_mbcliplen(key.encoding,

                              query,

                              key.query_len,

                              pgss->query_size - );        

    key.query_ptr = query;                                

    usage = USAGE_EXEC(duration);                                

    /* Lookup the hash table entry with shared lock. */

    LWLockAcquire(pgss->lock, LW_SHARED);                                

    entry = (pgssEntry *) hash_search(pgss_hash, &key, HASH_FIND, NULL);                                

    if (!entry)

    {

        /* Must acquire exclusive lock to add a new entry. */

        LWLockRelease(pgss->lock);

        LWLockAcquire(pgss->lock, LW_EXCLUSIVE);

        entry = entry_alloc(&key);

    }                                

    /* Grab the spinlock while updating the counters. */

    {

        volatile pgssEntry *e = (volatile pgssEntry *) entry;                            

        SpinLockAcquire(&e->mutex);                            

        e->counters.calls += 1;

        e->counters.total_time += total_time;

        e->counters.rows += rows;

        e->counters.shared_blks_hit += bufusage->shared_blks_hit;

        e->counters.shared_blks_read += bufusage->shared_blks_read;

        e->counters.shared_blks_written += bufusage->shared_blks_written;

        e->counters.local_blks_hit += bufusage->local_blks_hit;

        e->counters.local_blks_read += bufusage->local_blks_read;

        e->counters.local_blks_written += bufusage->local_blks_written;

        e->counters.temp_blks_read += bufusage->temp_blks_read;

        e->counters.temp_blks_written += bufusage->temp_blks_written;

        e->counters.usage += usage;  

        SpinLockRelease(&e->mutex);                            

    }                                

    LWLockRelease(pgss->lock);

}

如果把上述e->counters的各个组成部分和定义，与下面的pg_stat_statements的文档资料对比，可以发现它们完全一致：

http://www.postgresql.org/docs/9.1/static/pgstatstatements.html

/*

 * Statistics per statement

 *

 * NB: see the file read/write code before changing field order here.

 */

typedef struct pgssEntry

{

    pgssHashKey key;                /* hash key of entry - MUST BE FIRST */

    Counters    counters;            /* the statistics for this query */

    slock_t        mutex;        /* protects the counters only */

    char        query[];        /* VARIABLE LENGTH ARRAY - MUST BE LAST */

    /* Note: the allocated length of query[] is actually pgss->query_size */

} pgssEntry;

/*

 * The actual stats counters kept within pgssEntry.

 */

typedef struct Counters

{

    int64        calls;            /* # of times executed */

    double        total_time;            /* total execution time in seconds */

    int64        rows;            /* total # of retrieved or affected rows */

    int64        shared_blks_hit;            /* # of shared buffer hits */

    int64        shared_blks_read;            /* # of shared disk blocks read */

    int64        shared_blks_written;            /* # of shared disk blocks written */

    int64        local_blks_hit;             /* # of local buffer hits */

    int64        local_blks_read;            /* # of local disk blocks read */

    int64        local_blks_written;            /* # of local disk blocks written */

    int64        temp_blks_read;             /* # of temp blocks read */

    int64        temp_blks_written;            /* # of temp blocks written */

    double        usage;            /* usage factor */

} Counters;

/*

 * Hashtable key that defines the identity of a hashtable entry.  The

 * hash comparators do not assume that the query string is null-terminated;

 * this lets us search for an mbcliplen'd string without copying it first.

 *

 * Presently, the query encoding is fully determined by the source database

 * and so we don't really need it to be in the key.  But that might not always

 * be true. Anyway it's notationally convenient to pass it as part of the key.

 */

typedef struct pgssHashKey

{

    Oid            userid;        /* user OID */

    Oid            dbid;        /* database OID */

    int            encoding;        /* query encoding */

    int            query_len;        /* # of valid bytes in query string */

    const char *query_ptr;                    /* query string proper */

} pgssHashKey;

再：看看建立extension时使用的脚本，也是一致的：

CREATE FUNCTION pg_stat_statements(

    OUT userid oid,

    OUT dbid oid,

    OUT query text,

    OUT calls int8,

    OUT total_time float8,

    OUT rows int8,

    OUT shared_blks_hit int8,

    OUT shared_blks_read int8,

    OUT shared_blks_written int8,

    OUT local_blks_hit int8,

    OUT local_blks_read int8,

    OUT local_blks_written int8,

    OUT temp_blks_read int8,

    OUT temp_blks_written int8

)

RETURNS SETOF record

AS 'MODULE_PATHNAME'

LANGUAGE C;

那么，在pg_stat_statements的hook函数中，保存在hash表里的sql文执行信息，是如何通过
类似于 select * from pg_stat_statemens的语句取得的呢？这是因为此extension的定义和实现：

Datum        pg_stat_statements_reset(PG_FUNCTION_ARGS);

Datum        pg_stat_statements(PG_FUNCTION_ARGS);                            

PG_FUNCTION_INFO_V1(pg_stat_statements_reset);

PG_FUNCTION_INFO_V1(pg_stat_statements);

在pg_stat_statements函数中，从hash表中取出了所有数据：

/*

 * Retrieve statement statistics.

 */

Datum

pg_stat_statements(PG_FUNCTION_ARGS)

{

    ...                                 

    MemoryContextSwitchTo(oldcontext);                                    

    LWLockAcquire(pgss->lock, LW_SHARED);                                    

    hash_seq_init(&hash_seq, pgss_hash);

    while (      (entry = hash_seq_search(&hash_seq)) != NULL      )

    {

        Datum        values[PG_STAT_STATEMENTS_COLS];

        bool        nulls[PG_STAT_STATEMENTS_COLS];

        int            i = ;

        Counters        tmp;                        

        memset(values, , sizeof(values));

        memset(nulls, , sizeof(nulls));                                

        values[i++] = ObjectIdGetDatum(entry->key.userid);

        values[i++] = ObjectIdGetDatum(entry->key.dbid);                                

        if (is_superuser || entry->key.userid == userid)

        {

            char       *qstr;                        

            qstr = (char *)

                pg_do_encoding_conversion((unsigned char *) entry->query,

                                  entry->key.query_len,

                                  entry->key.encoding,

                                  GetDatabaseEncoding());

            values[i++] = CStringGetTextDatum(qstr);

            if (qstr != entry->query)

                pfree(qstr);

        }

        else

            values[i++] = CStringGetTextDatum("<insufficient privilege>");                            

        /* copy counters to a local variable to keep locking time short */

        {

            volatile pgssEntry *e = (volatile pgssEntry *) entry;                            

            SpinLockAcquire(&e->mutex);

            tmp = e->counters;

            SpinLockRelease(&e->mutex);

        }                                

        values[i++] = Int64GetDatumFast(tmp.calls);

        values[i++] = Float8GetDatumFast(tmp.total_time);

        values[i++] = Int64GetDatumFast(tmp.rows);

        values[i++] = Int64GetDatumFast(tmp.shared_blks_hit);

        values[i++] = Int64GetDatumFast(tmp.shared_blks_read);

        values[i++] = Int64GetDatumFast(tmp.shared_blks_written);

        values[i++] = Int64GetDatumFast(tmp.local_blks_hit);

        values[i++] = Int64GetDatumFast(tmp.local_blks_read);

        values[i++] = Int64GetDatumFast(tmp.local_blks_written);

        values[i++] = Int64GetDatumFast(tmp.temp_blks_read);

        values[i++] = Int64GetDatumFast(tmp.temp_blks_written);                                

        Assert(i == PG_STAT_STATEMENTS_COLS);                                

        tuplestore_putvalues(tupstore, tupdesc, values, nulls);

    }                                    

    LWLockRelease(pgss->lock);                                    

    /* clean up and return the tuplestore */

    tuplestore_donestoring(tupstore);                                    

    return (Datum) ;

}

分析到此结束！

回到上一级页面：PostgreSQL内部结构与源代码研究索引页回到顶级页面：PostgreSQL索引页

磨砺技术珠矶，践行数据之道，追求卓越价值

pg_stat_statements源代码分析的更多相关文章

android-plugmgr源代码分析
android-plugmgr是一个Android插件加载框架,它最大的特点就是对插件不需要进行任何约束.关于这个类库的介绍见作者博客,市面上也有一些插件加载框架,但是感觉没有这个好.在这篇文章中,我 ...
Twitter Storm源代码分析之ZooKeeper中的目录结构
徐明明博客:Twitter Storm源代码分析之ZooKeeper中的目录结构我们知道Twitter Storm的所有的状态信息都是保存在Zookeeper里面,nimbus通过在zookeepe ...
转：SDL2源代码分析
1:初始化(SDL_Init()) SDL简介有关SDL的简介在<最简单的视音频播放示例7:SDL2播放RGB/YUV>以及<最简单的视音频播放示例9:SDL2播放PCM>中 ...
转：RTMPDump源代码分析
0: 主要函数调用分析 rtmpdump 是一个用来处理 RTMP 流媒体的开源工具包,支持 rtmp://, rtmpt://, rtmpe://, rtmpte://, and rtmps://. ...
转：ffdshow 源代码分析
ffdshow神奇的功能:视频播放时显示运动矢量和QP FFDShow可以称得上是全能的解码.编码器.最初FFDShow只是mpeg视频解码器,不过现在他能做到的远不止于此.它能够解码的视频格式已经远 ...
UiAutomator源代码分析之UiAutomatorBridge框架
上一篇文章<UIAutomator源代码分析之启动和执行>我们描写叙述了uitautomator从命令行执行到载入測试用例执行測试的整个流程.过程中我们也描写叙述了UiAutomatorB ...
MyBatis架构设计及源代码分析系列(一):MyBatis架构
如果不太熟悉MyBatis使用的请先参见MyBatis官方文档,这对理解其架构设计和源码分析有很大好处. 一.概述 MyBatis并不是一个完整的ORM框架,其官方首页是这么介绍自己 The MyBa ...
hostapd源代码分析（三）：管理帧的收发和处理
hostapd源代码分析(三):管理帧的收发和处理原文链接:http://blog.csdn.net/qq_21949217/article/details/46004379 这篇文章我来讲解一下h ...
hostapd源代码分析（二）：hostapd的工作机制
[转]hostapd源代码分析(二):hostapd的工作机制原文链接:http://blog.csdn.net/qq_21949217/article/details/46004433 在我的上一 ...

随机推荐

[微信] 客服接口调用的时候返回 40003 Invalid OpenID
首先确认收件人在24小时内主动向公众号发过消息.该消息的 FromUserId 即是客服消息的 touser 参数的 OpenId 2017-05-19 更新:可以使用UTF-8了 string ur ...
C#的Lambda 表达式都使用 Lambda 运算符 =>，该运算符读为“goes to”。语法如下：
形参列表=>函数体函数体多于一条语句的可用大括号括起. 类型可以将此表达式分配给委托类型,如下所示: delegate int del(int i); del myDelegate = ...
在html的JavaScript部分计算，保留小数点后面的位数
例: f_pbf = ((f_boday_fat/f_weight)*100).toFixed(1); 注:例子中的.toFixed(1)是所用函数,确保在所得结果中保留小数点后面一位数,若 ...
MySQL 5.7 Reference Manual】15.4.2 Change Buffer（变更缓冲）
15.4.2 Change Buffer(变更缓冲) The change buffer is a special data structure that caches changes to se ...
NSCopying简析
NSCopying简析用到NSCopying的时候并不多,但还是有必要知道最基本的用途,比方说数组的拷贝操作,需要注意的是,数组的拷贝操作并不是执行了 copy 方法,而是需要执行 initWith ...
为什么php+apache本地站点访问超级慢
/etc/hosts中必然有一行为127.0.0.1,作用是什么呢? 特点: (1)127.0.0.1不光是unix系统,linux也好,windows也好,都会有这个循回地址的.(2)在IP地址的规 ...
PHP设计模式系列 - 外观模式
外观模式通过在必需的逻辑和方法的集合前创建简单的外观接口,外观设计模式隐藏了调用对象的复杂性. 外观设计模式和建造者模式非常相似,建造者模式一般是简化对象的调用的复杂性,外观模式一般是简化含有很多逻 ...
php解析xml文件的方法
最近一段时间在做模板包导入.模板包中包含有xml文件,,需要解析成给定的php数组格式. 我接触到了两种方法,分别是DOMDocument 方法和 simple_load_file. 个人偏好后一种, ...
解密虚拟 DOM——snabbdom 核心源码解读
本文源码地址:https://github.com/zhongdeming428/snabbdom 对很多人而言,虚拟 DOM 都是一个很高大上而且远不可及的专有名词,以前我也这么认为,后来在学习 V ...
Java设计模式16：常用设计模式之观察者模式（行为型模式）
1. Java之观察者模式(Observer Pattern) (1)概述: 生活中我们在使用新闻app,当我们对某一栏比较感兴趣,我们往往会订阅这栏新闻,比如我对军事栏感兴趣,我就会订阅军事栏的新闻 ...

pg_stat_statements源代码分析

pg_stat_statements源代码分析的更多相关文章

随机推荐

热门专题