Greenplum 函数 gp_dist

转载自：https://yq.aliyun.com/articles/7593

函数作用：

gp_dist_random('gp_id')本质上就是在所有节点查询gp_id，
gp_dist_random('pg_authid')就是在所有节点查询pg_authid，

使用greenplum时，如果需要调用一个函数，这个函数很可能就在master执行，而不会跑到segment上去执行。
例如 random()函数。
通过select random()来调用的话，不需要将这条SQL发送到segment节点，所以执行计划如下，没有gather motion的过程。

postgres=# explain analyze select random();

                                       QUERY PLAN

----------------------------------------------------------------------------------------

 Result  (cost=0.01..0.02 rows=1 width=0)

   Rows out:  1 rows with 0.017 ms to end, start offset by 0.056 ms.

   InitPlan

     ->  Result  (cost=0.00..0.01 rows=1 width=0)

           Rows out:  1 rows with 0.004 ms to end of 2 scans, start offset by 0.059 ms.

 Slice statistics:

   (slice0)    Executor memory: 29K bytes.

   (slice1)    Executor memory: 29K bytes.

 Statement statistics:

   Memory used: 128000K bytes

 Total runtime: 0.074 ms

(11 rows)

如果要让这个函数在segment执行，怎么办呢？
通过gp_dist_random('gp_id')来调用，gp_dist_random的参数是一个可查询的视图，或表。

postgres=# explain analyze select random() from gp_dist_random('gp_id');

                                                               QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------

 Gather Motion 240:1  (slice1; segments: 240)  (cost=0.00..4.00 rows=240 width=0)

   Rows out:  240 rows at destination with 6.336 ms to first row, 59 ms to end, start offset by 4195 ms.

   ->  Seq Scan on gp_id  (cost=0.00..4.00 rows=1 width=0)

         Rows out:  Avg 1.0 rows x 240 workers.  Max 1 rows (seg0) with 0.073 ms to first row, 0.075 ms to end, start offset by 4207 ms.

 Slice statistics:

   (slice0)    Executor memory: 471K bytes.

   (slice1)    Executor memory: 163K bytes avg x 240 workers, 163K bytes max (seg0).

 Statement statistics:

   Memory used: 128000K bytes

 Total runtime: 4279.445 ms

(10 rows)

gp_id在每个segment中都有一条记录，所以以上SQL会在每个SEGMENT中调用一次random()并返回所有结果，例如我的测试环境中有240个segment, 那么以上SQL将返回240条记录。

在gp_id的定义中，介绍了gp_dist_random用它可以做一些管理的工作：
譬如查询数据库的大小，查询表的大小，其实都是这样统计的。
src/backend/catalog/postgres_bki_srcs

/*-------------------------------------------------------------------------

 *

 * gp_id.h

 *        definition of the system "database identifier" relation (gp_dbid)

 *        along with the relation's initial contents.

 *

 * Copyright (c) 2009-2010, Greenplum inc

 *

 * NOTES

 *    Historically this table was used to supply every segment with its

 * identification information.  However in the 4.0 release when the file

 * replication feature was added it could no longer serve this purpose

 * because it became a requirement for all tables to have the same physical

 * contents on both the primary and mirror segments.  To resolve this the

 * information is now passed to each segment on startup based on the

 * gp_segment_configuration (stored on the master only), and each segment

 * has a file in its datadirectory (gp_dbid) that uniquely identifies the

 * segment.

 *

 *   The contents of the table are now irrelevant, with the exception that

 * several tools began relying on this table for use as a method of remote

 * function invocation via gp_dist_random('gp_id') due to the fact that this

 * table was guaranteed of having exactly one row on every segment.  The

 * contents of the row have no defined meaning, but this property is still

 * relied upon.

 */

#ifndef _GP_ID_H_

#define _GP_ID_H_  

#include "catalog/genbki.h"

/*

 * Defines for gp_id table

 */

#define GpIdRelationName                        "gp_id"  

/* TIDYCAT_BEGINFAKEDEF  

   CREATE TABLE gp_id

   with (shared=true, oid=false, relid=5001, content=SEGMENT_LOCAL)

   (

   gpname       name     ,

   numsegments  smallint ,

   dbid         smallint ,

   content      smallint

   );  

   TIDYCAT_ENDFAKEDEF

*/

查询数据库大小的GP函数

postgres=# \df+ pg_database_size

                                                                                                     List of functions

   Schema   |       Name       | Result data type | Argument data types |  Type  |  Data access   | Volatility |  Owner   | Language |      Source code      |                         Description

------------+------------------+------------------+---------------------+--------+----------------+------------+----------+----------+-----------------------+-------------------------------------------------------------

 pg_catalog | pg_database_size | bigint           | name                | normal | reads sql data | volatile   | dege.zzz | internal | pg_database_size_name | Calculate total disk space usage for the specified database

 pg_catalog | pg_database_size | bigint           | oid                 | normal | reads sql data | volatile   | dege.zzz | internal | pg_database_size_oid  | Calculate total disk space usage for the specified database

(2 rows)

其中pg_database_size_name 的源码如下：
很明显，在统计数据库大小时也用到了select sum(pg_database_size('%s'))::int8 from gp_dist_random('gp_id');

Datum

pg_database_size_name(PG_FUNCTION_ARGS)

{

        int64           size = 0;

        Name            dbName = PG_GETARG_NAME(0);

        Oid                     dbOid = get_database_oid(NameStr(*dbName));  

        if (!OidIsValid(dbOid))

                ereport(ERROR,

                                (errcode(ERRCODE_UNDEFINED_DATABASE),

                                 errmsg("database \"%s\" does not exist",

                                                NameStr(*dbName))));  

        size = calculate_database_size(dbOid);  

        if (Gp_role == GP_ROLE_DISPATCH)

        {

                StringInfoData buffer;  

                initStringInfo(&buffer);  

                appendStringInfo(&buffer, "select sum(pg_database_size('%s'))::int8 from gp_dist_random('gp_id');", NameStr(*dbName));  

                size += get_size_from_segDBs(buffer.data);

        }  

        PG_RETURN_INT64(size);

}

不信我们可以直接查询这个SQL，和使用pg_database_size函数得到的结果几乎是一样的，只差了calculate_database_size的部分。

postgres=# select sum(pg_database_size('postgres'))::int8 from gp_dist_random('gp_id');

      sum

----------------

 16006753522624

(1 row)  

postgres=# select pg_database_size('postgres');

 pg_database_size

------------------

   16006763924106

(1 row)

gp_dist_random('gp_id')本质上就是在所有节点查询gp_id，
gp_dist_random('pg_authid')就是在所有节点查询pg_authid，
例如：

postgres=# select * from gp_dist_random('gp_id');

  gpname   | numsegments | dbid | content

-----------+-------------+------+---------

 Greenplum |          -1 |   -1 |      -1

 Greenplum |          -1 |   -1 |      -1

 Greenplum |          -1 |   -1 |      -1

 Greenplum |          -1 |   -1 |      -1

 Greenplum |          -1 |   -1 |      -1

 Greenplum |          -1 |   -1 |      -1

 Greenplum |          -1 |   -1 |      -1

 Greenplum |          -1 |   -1 |      -1

 Greenplum |          -1 |   -1 |      -1

 Greenplum |          -1 |   -1 |      -1

。。。。。。

如果不想返回太多记录，可以使用limit 来过滤，但是执行还是会在所有的segment都执行，如下：

postgres=# explain analyze select random() from gp_dist_random('gp_id') limit 1;

                                                                  QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------

 Limit  (cost=0.00..0.04 rows=1 width=0)

   Rows out:  1 rows with 5.865 ms to first row, 5.884 ms to end, start offset by 4212 ms.

   ->  Gather Motion 240:1  (slice1; segments: 240)  (cost=0.00..0.04 rows=1 width=0)

         Rows out:  1 rows at destination with 5.857 ms to end, start offset by 4212 ms.

         ->  Limit  (cost=0.00..0.02 rows=1 width=0)

               Rows out:  Avg 1.0 rows x 240 workers.  Max 1 rows (seg0) with 0.062 ms to first row, 0.063 ms to end, start offset by 4228 ms.

               ->  Seq Scan on gp_id  (cost=0.00..4.00 rows=1 width=0)

                     Rows out:  Avg 1.0 rows x 240 workers.  Max 1 rows (seg0) with 0.060 ms to end, start offset by 4228 ms.

 Slice statistics:

   (slice0)    Executor memory: 463K bytes.

   (slice1)    Executor memory: 163K bytes avg x 240 workers, 163K bytes max (seg0).

 Statement statistics:

   Memory used: 128000K bytes

 Total runtime: 4288.007 ms

(14 rows)

Greenplum 函数 gp_dist_random的更多相关文章

PostgreSQL、Greenplum 日常监控和维护任务
背景 Greenplum的日常监控点.评判标准,日常维护任务. 展示图层由于一台主机可能跑多个实例,建议分层展示. 另外,即使是ON ECS虚拟机(一个虚拟机一个实例一对一的形态)的产品形态,实际上 ...
PostgreSQL和Greenplum、Npgsql
PostgreSQL和Greenplum.Npgsql 想着要不要写,两个原因“懒”和“空”.其实懒和空也是有联系的,不是因为懒的写,而是因为对PostgreSQL和Npgsql的知识了解匮乏,也就懒 ...
greenplum和postgresql
想着要不要写,两个原因"懒"和"空".其实懒和空也是有联系的,不是因为懒的写,而是因为对PostgreSQL和Npgsql的知识了解匮乏,也就懒得写.好了,开头 ...
Greenplum入门——基础知识、安装、常用函数
Greenplum入门——基础知识.安装.常用函数 2017年10月08日 22:03:09 在咖啡里溺水的鱼阅读数:8709 版权声明:本文为博主原创,允许非商业性质转载但请注明原作者和出处 ...
greenplum中to_date函数注意点
今天协助排查异常数据,发现是如下类似代码产生的: to_date(col_name,'yyyymmdd'),其中col_name是date类型. 这个代码运行后,结果是:2018-11-16的date ...
GreenPlum学习笔记：create or replace function创建函数
原始表数据如下: 需求:现要求按分号“;”将rate_item列进行分割后插入到新的数据表中. CREATE OR REPLACE FUNCTION fun_gp_test_xxx_20181026( ...
Greenplum（PostgreSql）中函数内游标的使用实例
直接上代码,具体整体函数定义就不上了,只写关键部分: --定义两个变量 DECLARE CCUR REFCURSOR; -- 游标变量 RECORD1 RECORD; -- 记录变量,用来存储游标遍历 ...
Greenplum（PostgreSql）函数实现批量删除表
项目做库迁移,前期需要经常调整表结构语句,涉及多次的批量drop,本着偷懒精神写了这个函数.鉴于本函数在生产环境有巨大风险,建议测试完毕后立即删除. 主要步骤很简单:1)从pg_tables查询得到相 ...
Postgresql/Greenplum中将数字转换为字符串TO_CHAR函数前面会多出一个空格
-- 问题1..Postgresql中将数字转换为字符串前面多出一个空格. SELECT TO_CHAR(, '); -- 解决1.使用如下,参数二前面加上fm就可以去掉空格了,如下: SELECT ...

随机推荐

有关同时进行两条线路的四维dp
今天发现自己完全对这种dp没有思路……我果然太蒻了./落泪.jpg 对于一个N*N的方格图中选择两条线路从左上角到右下角,其实只要用一个数组f[i][j][p][q]记录一个人走到(i,j)另一个人走 ...
ssh使用
上传: scp myfile.txt username@192.168.1.1:/homw/ 下载: scp username@192.168.1.5:/home/myfile.txt / ...
C语言变量和常量
常量在程序执行过程中,其值不能被改变常量一般出现在表达式或者赋值语句利用const 修饰的变量为常量,不可修改利用define定义的一般为常量,定义时候不需要分号利用extern修饰的量知识 ...
基于openfire的IM即时通讯软件开发
openfire:http://www.igniterealtime.org/ Xmpp:http://xmpp.org/ IOS(xmppframework):https://github.com/ ...
P1777 帮助_NOI导刊2010提高（03）
也许更好的阅读体验 \(\mathcal{Description}\) Bubu的书架乱成一团了!帮他一下吧! 他的书架上一共有n本书.我们定义混乱值是连续相同高度书本的段数.例如,如果书的高度是30 ...
表单送件按钮代码(一)cs（C#）(未完)
protected void BtnRequest_Clich(object sender, EventArgs e) { lblMsg.Text= " " ; lblfmsg.T ...
Visual Studio中找不到.Net Core SDK
在win 7 64位上安装了.Net Core 2.1 x86 SDK后,又卸载重新安装了.Net Core 3/2 x64 SDK.结果在VS中新建项目时没有.Net Core 3.1 SDK. 在 ...
physdiskwrite 的简单使用
physdiskwrite 的简单使用参考 https://m0n0.ch/wall/physdiskwrite.php 来源 https://www.cnblogs.com/EasonJim/p ...
BPM业务流程管理系统_K2受邀出席QAD客户日活动，赋能企业云端智造_工作流引擎
10月17日,K2受邀参加由厦门易维主办的以“走进QAD云ERP,深耕智能制造”为主题的QAD客户日活动.本次大会是以工业4.0背景下传统制造业面临巨大压力和挑战为导向,旨在探讨如何助力企业迅速适应业 ...
Swaks - SMTP界的瑞士军刀
0x00 安装: kali中自带,或者从作者网页下载 http://www.jetmore.org/john/code/swaks/ 0x01 基本用法: swaks –to <要测试的邮箱&g ...

Greenplum 函数 gp_dist_random

Greenplum 函数 gp_dist_random的更多相关文章

随机推荐

热门专题