Deepgreen DB简介（转）

原文链接

Deepgreen DB 全称 Vitesse Deepgreen DB，它是一个可扩展的大规模并行（通常称为MPP）数据仓库解决方案，起源于开源数据仓库项目Greenplum DB（通常称为GP或GPDB）。所以已经熟悉了GP的朋友，可以无缝切换到Deepgreen。

它几乎拥有GP的所有功能，在保有GP所有优势的基础上，Deepgreen对原查询处理引擎进行了优化，新一代查询处理引擎扩展了：

优越的连接和聚合算法
新的溢出处理子系统
基于JIT的查询优化、矢量扫描和数据路径优化

下面简单介绍一下Deepgreen的主要特性（主要与Greenplum对比）：

1. 100% GPDB

Deepgreen与Greenplum几乎100%一致，这里说几乎，是因为Deepgreen也剔除了一些Greenplum上的鸡肋功能，例如MapReduce支持，可以说保有的都是精华。从SQL语法、存储过程语法，到数据存储格式，再到像gpstart/gpfdist等组件，Deepgreen为想要从Greenplum迁移过来的用户将迁移影响降到最低。尤其是在下面这些方面：

除了以quicklz方式压缩的数据需要修改外，其他数据无需重新装载
DML和DDL语句没有任何改变
UDF（用户定义函数）语法没有任何改变
存储过程语法没有任何改变
JDBC／ODBC等连接和授权协议没有任何改变
运行脚本没有任何改变（例如备份脚本）

那么Deepgreen和Greenplum的不同之处在哪呢？总结成一个词就是：快！快！快！（重要的事情说三遍）。因为大部分的OLAP工作都与CPU的性能有关，所以针对CPU优化后的Deepgreen在性能测试中，可以达到比原Greenplum快3～5倍的性能。

2.更快的Decimal类型

Deepgreen提供了两个更精确的Decimal类型：Decimal64和Decimal128，它们比Greenplum原有的Decimal类型（Numeric）更有效。因为它们更精确，相比于fload／double类型，更适合用在银行等对数据准确性要求高的业务场景。

安装：

这两个数据类型需要在数据库初始化以后，通过命令加载到需要的数据库中：

dgadmin@flash:~$ source deepgreendb/greenplum_path.sh

dgadmin@flash:~$ cd $GPHOME/share/postgresql/contrib/

dgadmin@flash:~/deepgreendb/share/postgresql/contrib$ psql postgres -f pg_decimal.sql

测试一把：

使用语句：select avg(x), sum(2*x) from table

数据量：100万

dgadmin@flash:~$ psql -d postgres

psql (8.2.15)

Type "help" for help.

postgres=# drop table if exists tt;

NOTICE:  table "tt" does not exist, skipping

DROP TABLE

postgres=# create table tt(

postgres(# ii bigint,

postgres(#  f64 double precision,

postgres(# d64 decimal64,

postgres(# d128 decimal128,

postgres(# n numeric(15, 3))

postgres-# distributed randomly;

CREATE TABLE

postgres=# insert into tt

postgres-# select i,

postgres-#     i + 0.123,

postgres-#     (i + 0.123)::decimal64,

postgres-#     (i + 0.123)::decimal128,

postgres-#     i + 0.123

postgres-# from generate_series(1, 1000000) i;

INSERT 0 1000000

postgres=# \timing on

Timing is on.

postgres=# select count(*) from tt;

  count

---------

 1000000

(1 row)

Time: 161.500 ms

postgres=# set vitesse.enable=1;

SET

Time: 1.695 ms

postgres=# select avg(f64),sum(2*f64) from tt;

       avg        |       sum

------------------+------------------

 500000.622996815 | 1000001245993.63

(1 row)

Time: 45.368 ms

postgres=# select avg(d64),sum(2*d64) from tt;

    avg     |        sum

------------+-------------------

 500000.623 | 1000001246000.000

(1 row)

Time: 135.693 ms

postgres=# select avg(d128),sum(2*d128) from tt;

    avg     |        sum

------------+-------------------

 500000.623 | 1000001246000.000

(1 row)

Time: 148.286 ms

postgres=# set vitesse.enable=1;

SET

Time: 11.691 ms

postgres=# select avg(n),sum(2*n) from tt;

         avg         |        sum

---------------------+-------------------

 500000.623000000000 | 1000001246000.000

(1 row)

Time: 154.189 ms

postgres=# set vitesse.enable=0;

SET

Time: 1.426 ms

postgres=# select avg(n),sum(2*n) from tt;

         avg         |        sum

---------------------+-------------------

 500000.623000000000 | 1000001246000.000

(1 row)

Time: 296.291 ms

结果列表：

45ms - 64位float

136ms - decimal64

148ms - decimal128

154ms - deepgreen numeric

296ms - greenplum numeric

通过上面的测试，decimal64（136ms）类型比deepgreen numeric（154ms）类型快，比greenplum numeric快两倍，生产环境中快5倍以上。

3.支持JSON

Deepgreen支持JSON类型，但是并不完全支持。不支持的函数有：json_each,json_each_text,json_extract_path,json_extract_path_text, json_object_keys, json_populate_record, json_populate_recordset, json_array_elements, and json_agg.

安装：

执行下面命令扩展json支持：

dgadmin@flash:~$ psql postgres -f $GPHOME/share/postgresql/contrib/json.sql

测试一把：

dgadmin@flash:~$ psql postgres

psql (8.2.15)

Type "help" for help.

postgres=# select '[1,2,3]'::json->2;

 ?column?

----------

 3

(1 row)

postgres=# create temp table mytab(i int, j json) distributed by (i);

CREATE TABLE

postgres=# insert into mytab values (1, null), (2, '[2,3,4]'), (3, '[3000,4000,5000]');

INSERT 0 3

postgres=#

postgres=# insert into mytab values (1, null), (2, '[2,3,4]'), (3, '[3000,4000,5000]');

INSERT 0 3

postgres=# select i, j->2 from mytab;

 i | ?column?

---+----------

 2 | 4

 2 | 4

 1 |

 3 | 5000

 1 |

 3 | 5000

(6 rows)

4.高效压缩算法

Deepgreen延续了Greenplum的zlib压缩算法用于存储压缩。除此之外，Deepgreen还提供两种对数据库负载更优的压缩格式：zstd和lz4.

如果客户在列存或者只追加堆表存储时要求更优的压缩比，请选择zstd压缩算法。相比于zlib，zstd有更好的压缩比，并且能更有效利用CPU。

如果客户有大量读取需求，那么可以选择lz4压缩算法，因为它有着惊人的解压速度。虽然在压缩比上lz4并没有zlib和zstd那么出众，但是为了满足高读取负载作出一些牺牲还是值得的。

有关于这两种压缩算法的具体内容，详见其主页：

zstd主页 http://facebook.github.io/zstd/
lz4主页 http://lz4.github.io/lz4/

测试一把：

这里只针对不压缩／zlib／zstd／lz4四种，进行简单的测试，我的机器性能并不高，所有结果仅供参考：

postgres=# create temp table ttnone (

postgres(#     i int,

postgres(#     t text,

postgres(#     default column encoding (compresstype=none))

postgres-# with (appendonly=true, orientation=column)

postgres-# distributed by (i);

CREATE TABLE

postgres=# \timing on

Timing is on.

postgres=# create temp table ttzlib(

postgres(#     i int,

postgres(#     t text,

postgres(#     default column encoding (compresstype=zlib, compresslevel=1))

postgres-# with (appendonly=true, orientation=column)

postgres-# distributed by (i);

CREATE TABLE

Time: 762.596 ms

postgres=# create temp table ttzstd (

postgres(#     i int,

postgres(#     t text,

postgres(#     default column encoding (compresstype=zstd, compresslevel=1))

postgres-# with (appendonly=true, orientation=column)

postgres-# distributed by (i);

CREATE TABLE

Time: 827.033 ms

postgres=# create temp table ttlz4 (

postgres(#     i int,

postgres(#     t text,

postgres(#     default column encoding (compresstype=lz4))

postgres-# with (appendonly=true, orientation=column)

postgres-# distributed by (i);

CREATE TABLE

Time: 845.728 ms

postgres=# insert into ttnone select i, 'user '||i from generate_series(1, 100000000) i;

INSERT 0 100000000

Time: 104641.369 ms

postgres=# insert into ttzlib select i, 'user '||i from generate_series(1, 100000000) i;

INSERT 0 100000000

Time: 99557.505 ms

postgres=# insert into ttzstd select i, 'user '||i from generate_series(1, 100000000) i;

INSERT 0 100000000

Time: 98800.567 ms

postgres=# insert into ttlz4 select i, 'user '||i from generate_series(1, 100000000) i;

INSERT 0 100000000

Time: 96886.107 ms

postgres=# select pg_size_pretty(pg_relation_size('ttnone'));

 pg_size_pretty

----------------

 1708 MB

(1 row)

Time: 83.411 ms

postgres=# select pg_size_pretty(pg_relation_size('ttzlib'));

 pg_size_pretty

----------------

 374 MB

(1 row)

Time: 4.641 ms

postgres=# select pg_size_pretty(pg_relation_size('ttzstd'));

 pg_size_pretty

----------------

 325 MB

(1 row)

Time: 5.015 ms

postgres=# select pg_size_pretty(pg_relation_size('ttlz4'));

 pg_size_pretty

----------------

 785 MB

(1 row)

Time: 4.483 ms

postgres=# select sum(length(t)) from ttnone;

    sum

------------

 1288888898

(1 row)

Time: 4414.965 ms

postgres=# select sum(length(t)) from ttzlib;

    sum

------------

 1288888898

(1 row)

Time: 4500.671 ms

postgres=# select sum(length(t)) from ttzstd;

    sum

------------

 1288888898

(1 row)

Time: 3849.648 ms

postgres=# select sum(length(t)) from ttlz4;

    sum

------------

 1288888898

(1 row)

Time: 3160.477 ms

5.数据采样

从Deepgreen 16.16版本开始，内建支持通过SQL进行数据真实采样，您可以通过定义行数或者定义采样比两种方式进行采样：

SELECT {select-clauses} LIMIT SAMPLE {n} ROWS;
SELECT {select-clauses} LIMIT SAMPLE {n} PERCENT;

测试一把：

postgres=# select count(*) from ttlz4;

   count

-----------

 100000000

(1 row)

Time: 903.661 ms

postgres=# select * from ttlz4 limit sample 0.00001 percent;

    i     |       t

----------+---------------

  3442917 | user 3442917

  9182620 | user 9182620

  9665879 | user 9665879

 13791056 | user 13791056

 15669131 | user 15669131

 16234351 | user 16234351

 19592531 | user 19592531

 39097955 | user 39097955

 48822058 | user 48822058

 83021724 | user 83021724

  1342299 | user 1342299

 20309120 | user 20309120

 34448511 | user 34448511

 38060122 | user 38060122

 69084858 | user 69084858

 73307236 | user 73307236

 95421406 | user 95421406

(17 rows)

Time: 4208.847 ms

postgres=# select * from ttlz4 limit sample 10 rows;

    i     |       t

----------+---------------

 78259144 | user 78259144

 85551752 | user 85551752

 90848887 | user 90848887

 53923527 | user 53923527

 46524603 | user 46524603

 31635115 | user 31635115

 19030885 | user 19030885

 97877732 | user 97877732

 33238448 | user 33238448

 20916240 | user 20916240

(10 rows)

Time: 3578.031 ms

6.TPC-H性能

Deepgreen与Greenplum的性能对比，请参考我另外两个帖子：

《Deepgreen与Greenplum TPC-H性能测试对比（使用德哥脚本）》

《Deepgreen与Greenplum TPC-H性能测试对比（使用VitesseData脚本）》

另外Deepgreen自身搭载的高性能组件Xdrive，在后期会另行分享～

End~

Deepgreen DB简介（转）的更多相关文章

Deepgreen DB 是什么（含Deepgreen和Greenplum下载地址）
Deepgreen官网下载地址:http://vitessedata.com/products/deepgreen-db/download/ 不需要注册 Greenplum官网下载地址:https:/ ...
免费数据库（SQLite、Berkeley DB、PostgreSQL、MySQL、Firebird、mSQL、MSDE、DB2 Express-C、Oracle XE）
SQLite数据库是中小站点CMS的最佳选择 SQLite 是一个类似Access的轻量级数据库系统,但是更小.更快.容量更大,并发更高.为什么说 SQLite 最适合做 CMS (内容管理系统)呢? ...
Deepgreen/Greenplum 删除节点步骤
Deepgreen/Greenplum删除节点步骤 Greenplum和Deepgreen官方都没有给出删除节点的方法和建议,但实际上,我们可以对节点进行删除.由于不确定性,删除节点极有可能导致其他的 ...
探索gff/gtf格式
参考: GFF格式说明 Generic Feature Format Version 3 (GFF3) 先下载一个 gtf 文件浏览一下 1 havana gene 11869 14409 . + . ...
探索Bioconductor数据包
参考: R的bioconductor包TxDb.Hsapiens.UCSC.hg19.knownGene详解 Bioconductor的数据包library(org.Hs.eg.db)简介
Service Broker应用（1）：简介、同server不同DB间的数据传输
简介:SQL Server Service Broker,以下简称SSB,是一种完全基于MSSQL数据库的数据处理技术,为短时间内处理大量数据提供了一种可靠.稳定.高效的解决方案.一次同步的数据最大可 ...
Mongo DB命令简介
引言最近在学习MongoDB 总结了一些命令及常用的东西做整理常用目录文件介绍 mongod 数据库部署命令 mongo 连接mongodb数据库而使用的命令 mongoimport 导入 ...
Berkeley DB Java Edition 简介
一. 简介 Berkeley DB Java Edition (JE)是一个完全用JAVA写的,它适合于管理海量的,简单的数据. l 能够高效率的 ...
01--数据库MySQL：【数据库DB】和【数据库管理系统DBMS】简介
1.数据库DB 数据库:DB(DataBase) 按照一定规则存储在计算机的内部存储设备上被各种用户或者应用共享的数据集合 2.数据库管理系统DBMS 1)数据库管理系统DBMS:DBMS(DataB ...

随机推荐

linux卸载mysql
第二.停止MYSQL运行以及卸载老版本 service mysqld stop #暂停MYSQL yum remove mysql mysql-* #卸载老版本MYSQL 通过上面的命令,我们先停止 ...
Jquery中val、text、html的区别
html就是你可以添加像<a></a>.<p></p>等标记text只能写文本如果写了上面的标记则会以文本形式输出val是属性,只有有该属性的对象才能调 ...
MR案例：链式ChainMapper
类似于Linux管道重定向机制,前一个Map的输出直接作为下一个Map的输入,形成一个流水线.设想这样一个场景:在Map阶段,数据经过mapper01和mapper02处理:在Reduce阶段,数据经 ...
ThreadLocal 从源码角度简单分析
目录 ThreadLcoal源码浅析 ThreadLocal的垃圾回收 Java引用 ThreadLocal的回收各线程中threadLocalMap的回收内存泄露问题总结参考 ThreadL ...
Windows下tomcat进程监控批处理程序
在Windows下tomcat进程监控批处理程序脚本如下: @echo off ::tomcat安装目录 set _tomcatDir=E:\myFiles\apache-tomcat-8.5.31 ...
LeetCode——Integer Replacement
Question Given a positive integer n and you can do operations as follow: If n is even, replace n wit ...
SecureCRT在mac下无法输入中断命令
mac下输入Ctrl +C无法中断程序,这个问题困扰了我好久,大概有很长一段时间我都是使用kill 进程的方式来代替中断: ps aux | grep python kill -9 pid 今天终于发 ...
js创建表格
js创建一个表格,其中的表头已经有了,要从json中读取的数据一行一行地创建表格 function create_table(data){ tableNode = document.getElemen ...
js实现全选checkbox
js代码 function selectAllCheckBox(parentid) { var PID = document.getElementById(parentid); var cb = PI ...
MySQL-性能优化-优化设计和设计原则
MySQL性能优化目的如何合理的设计数据库?什么样的数据库设计才能给后期DBA优化提供基石? 数据库设计与程序设计的差异? 数据库设计早期优化1. 关系明确(理清表之间的关系,可以通过冗余的方式提高效 ...

Deepgreen DB简介（转）

Deepgreen DB简介（转）的更多相关文章

随机推荐

热门专题