KingbaseES例程之拥有大量索引的表导入数据
概述
如何快速插入大量数据比如几千万上亿的带索引的数据表。
数据准备
准备一个拥有二十个索引的数据表。
kingbase=# \d+ bigtab
Table "kingbase.bigtab"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
c01 | integer | | | | plain | |
c02 | integer | | | | plain | |
c03 | integer | | | | plain | |
c04 | integer | | | | plain | |
c05 | integer | | | | plain | |
c06 | integer | | | | plain | |
c07 | integer | | | | plain | |
c08 | integer | | | | plain | |
c09 | integer | | | | plain | |
c10 | integer | | | | plain | |
c11 | integer | | | | plain | |
c12 | integer | | | | plain | |
c13 | integer | | | | plain | |
c14 | integer | | | | plain | |
c15 | integer | | | | plain | |
c16 | integer | | | | plain | |
c17 | integer | | | | plain | |
c18 | integer | | | | plain | |
c19 | integer | | | | plain | |
c20 | integer | | | | plain | |
c21 | integer | | | | plain | |
c22 | integer | | | | plain | |
c23 | integer | | | | plain | |
c24 | integer | | | | plain | |
c25 | integer | | | | plain | |
c26 | integer | | | | plain | |
c27 | integer | | | | plain | |
c28 | integer | | | | plain | |
c29 | integer | | | | plain | |
t01 | text | | | | extended | |
t02 | text | | | | extended | |
t03 | text | | | | extended | |
t04 | text | | | | extended | |
t05 | text | | | | extended | |
t06 | text | | | | extended | |
t07 | text | | | | extended | |
t08 | text | | | | extended | |
t09 | text | | | | extended | |
t10 | text | | | | extended | |
t11 | text | | | | extended | |
t12 | text | | | | extended | |
t13 | text | | | | extended | |
t14 | text | | | | extended | |
t15 | text | | | | extended | |
t16 | text | | | | extended | |
t17 | text | | | | extended | |
t18 | text | | | | extended | |
t19 | text | | | | extended | |
t20 | text | | | | extended | |
Indexes:
"bigtab_i01" btree (c01)
"bigtab_i02" btree (c02)
"bigtab_i03" btree (c03)
"bigtab_i04" btree (c04)
"bigtab_i05" btree (c05)
"bigtab_i06" btree (c06)
"bigtab_i07" btree (c07)
"bigtab_i08" btree (c08)
"bigtab_i09" btree (c09)
"bigtab_i10" btree (c10)
"bigtab_i11" btree (c11)
"bigtab_i12" btree (c12)
"bigtab_i13" btree (c13)
"bigtab_i14" btree (c14)
"bigtab_i15" btree (c15)
"bigtab_i16" btree (c16)
"bigtab_i17" btree (c17)
"bigtab_i18" btree (c18)
"bigtab_i19" btree (c19)
"bigtab_i20" btree (c20)
Access method: heap
kingbase=#
方法一:直接插入海量数据,自动维护索引
kingbase=#
kingbase=# insert into bigtab
kingbase-# select id
kingbase-# , (random() * 100)::int + 1000 c01
kingbase-# , (random() * 200)::int + 1000 c02
kingbase-# , (random() * 300)::int + 10000 c03
kingbase-# , (random() * 400)::int + 10000 c04
kingbase-# , (random() * 500)::int + 10000 c05
kingbase-# , (random() * 600)::int + 10000 c06
kingbase-# , (random() * 700)::int + 10000 c07
kingbase-# , (random() * 800)::int + 10000 c08
kingbase-# , (random() * 900)::int + 10000 c09
kingbase-# , (random() * 1000)::int + 10000 c10
kingbase-# , (random() * 2000)::int + 10000 c11
kingbase-# , (random() * 3000)::int + 10000 c12
kingbase-# , (random() * 4000)::int + 10000 c13
kingbase-# , (random() * 5000)::int + 10000 c14
kingbase-# , (random() * 6000)::int + 10000 c15
kingbase-# , (random() * 7000)::int + 10000 c16
kingbase-# , (random() * 8000)::int + 10000 c17
kingbase-# , (random() * 9000)::int + 10000 c18
kingbase-# , (random() * 10000)::int + 10000 c19
kingbase-# , (random() * 20000)::int + 10000 c20
kingbase-# , (random() * 30000)::int + 10000 c21
kingbase-# , (random() * 40000)::int + 10000 c22
kingbase-# , (random() * 50000)::int + 10000 c23
kingbase-# , (random() * 60000)::int + 10000 c24
kingbase-# , (random() * 70000)::int + 10000 c25
kingbase-# , (random() * 80000)::int + 10000 c26
kingbase-# , (random() * 90000)::int + 10000 c27
kingbase-# , (random() * 10000)::int + 10000 c28
kingbase-# , (random() * 10000)::int + 10000 c29
kingbase-# , md5(random()::text) t01
kingbase-# , md5(random()::text) t02
kingbase-# , md5(random()::text) t03
kingbase-# , md5(random()::text) t04
kingbase-# , md5(random()::text) t05
kingbase-# , md5(random()::text) t06
kingbase-# , md5(random()::text) t07
kingbase-# , md5(random()::text) t08
kingbase-# , md5(random()::text) t09
kingbase-# , md5(random()::text) t10
kingbase-# , md5(random()::text) t11
kingbase-# , md5(random()::text) t12
kingbase-# , md5(random()::text) t13
kingbase-# , md5(random()::text) t14
kingbase-# , md5(random()::text) t15
kingbase-# , md5(random()::text) t16
kingbase-# , md5(random()::text) t17
kingbase-# , md5(random()::text) t18
kingbase-# , md5(random()::text) t19
kingbase-# , md5(random()::text) t20
kingbase-# from generate_series(1, 2000000) id;
INSERT 0 2000000
Time: 299331.143 ms (04:59.331)
优点: 语句单一;自动维护索引;自动支持之后的索引。
缺点: 逐行维护索引,造成用时较长。
方法二:删除索引,插入海量数据,再创建索引
kingbase=#
kingbase=# do
kingbase-# $$
kingbase$# begin
kingbase$# drop index bigtab_i01;
kingbase$# drop index bigtab_i02;
kingbase$# drop index bigtab_i03;
kingbase$# drop index bigtab_i04;
kingbase$# drop index bigtab_i05;
kingbase$# drop index bigtab_i06;
kingbase$# drop index bigtab_i07;
kingbase$# drop index bigtab_i08;
kingbase$# drop index bigtab_i09;
kingbase$# drop index bigtab_i10;
kingbase$# drop index bigtab_i11;
kingbase$# drop index bigtab_i12;
kingbase$# drop index bigtab_i13;
kingbase$# drop index bigtab_i14;
kingbase$# drop index bigtab_i15;
kingbase$# drop index bigtab_i16;
kingbase$# drop index bigtab_i17;
kingbase$# drop index bigtab_i18;
kingbase$# drop index bigtab_i19;
kingbase$# drop index bigtab_i20;
kingbase$#
kingbase$# insert into bigtab
kingbase$# select id
kingbase$# , (random() * 100)::int + 1000 c01
kingbase$# , (random() * 200)::int + 1000 c02
kingbase$# , (random() * 300)::int + 10000 c03
kingbase$# , (random() * 400)::int + 10000 c04
kingbase$# , (random() * 500)::int + 10000 c05
kingbase$# , (random() * 600)::int + 10000 c06
kingbase$# , (random() * 700)::int + 10000 c07
kingbase$# , (random() * 800)::int + 10000 c08
kingbase$# , (random() * 900)::int + 10000 c09
kingbase$# , (random() * 1000)::int + 10000 c10
kingbase$# , (random() * 2000)::int + 10000 c11
kingbase$# , (random() * 3000)::int + 10000 c12
kingbase$# , (random() * 4000)::int + 10000 c13
kingbase$# , (random() * 5000)::int + 10000 c14
kingbase$# , (random() * 6000)::int + 10000 c15
kingbase$# , (random() * 7000)::int + 10000 c16
kingbase$# , (random() * 8000)::int + 10000 c17
kingbase$# , (random() * 9000)::int + 10000 c18
kingbase$# , (random() * 10000)::int + 10000 c19
kingbase$# , (random() * 20000)::int + 10000 c20
kingbase$# , (random() * 30000)::int + 10000 c21
kingbase$# , (random() * 40000)::int + 10000 c22
kingbase$# , (random() * 50000)::int + 10000 c23
kingbase$# , (random() * 60000)::int + 10000 c24
kingbase$# , (random() * 70000)::int + 10000 c25
kingbase$# , (random() * 80000)::int + 10000 c26
kingbase$# , (random() * 90000)::int + 10000 c27
kingbase$# , (random() * 10000)::int + 10000 c28
kingbase$# , (random() * 10000)::int + 10000 c29
kingbase$# , md5(random()::text) t01
kingbase$# , md5(random()::text) t02
kingbase$# , md5(random()::text) t03
kingbase$# , md5(random()::text) t04
kingbase$# , md5(random()::text) t05
kingbase$# , md5(random()::text) t06
kingbase$# , md5(random()::text) t07
kingbase$# , md5(random()::text) t08
kingbase$# , md5(random()::text) t09
kingbase$# , md5(random()::text) t10
kingbase$# , md5(random()::text) t11
kingbase$# , md5(random()::text) t12
kingbase$# , md5(random()::text) t13
kingbase$# , md5(random()::text) t14
kingbase$# , md5(random()::text) t15
kingbase$# , md5(random()::text) t16
kingbase$# , md5(random()::text) t17
kingbase$# , md5(random()::text) t18
kingbase$# , md5(random()::text) t19
kingbase$# , md5(random()::text) t20
kingbase$# from generate_series(1, 2000000) id;
kingbase$#
kingbase$# create index bigtab_i01 on bigtab (c01);
kingbase$# create index bigtab_i02 on bigtab (c02);
kingbase$# create index bigtab_i03 on bigtab (c03);
kingbase$# create index bigtab_i04 on bigtab (c04);
kingbase$# create index bigtab_i05 on bigtab (c05);
kingbase$# create index bigtab_i06 on bigtab (c06);
kingbase$# create index bigtab_i07 on bigtab (c07);
kingbase$# create index bigtab_i08 on bigtab (c08);
kingbase$# create index bigtab_i09 on bigtab (c09);
kingbase$# create index bigtab_i10 on bigtab (c10);
kingbase$# create index bigtab_i11 on bigtab (c11);
kingbase$# create index bigtab_i12 on bigtab (c12);
kingbase$# create index bigtab_i13 on bigtab (c13);
kingbase$# create index bigtab_i14 on bigtab (c14);
kingbase$# create index bigtab_i15 on bigtab (c15);
kingbase$# create index bigtab_i16 on bigtab (c16);
kingbase$# create index bigtab_i17 on bigtab (c17);
kingbase$# create index bigtab_i18 on bigtab (c18);
kingbase$# create index bigtab_i19 on bigtab (c19);
kingbase$# create index bigtab_i20 on bigtab (c20);
kingbase$#
kingbase$# end;
kingbase$# $$;
ANONYMOUS BLOCK
Time: 83069.170 ms (01:23.069)
优点: 批量维护索引,用时最短。
缺点: 语句复杂且固化;手动维护删建索引语句;不支持之后的索引。
方法三:禁止索引更改,插入海量数据,重建表的全部索引
kingbase=# do
kingbase-# $$
kingbase$# begin
kingbase$#
kingbase$# update pg_index
kingbase$# set indislive= false
kingbase$# where indrelid = 'bigtab'::regclass;
kingbase$#
kingbase$# insert into bigtab
kingbase$# select id
kingbase$# , (random() * 100)::int + 1000 c01
kingbase$# , (random() * 200)::int + 1000 c02
kingbase$# , (random() * 300)::int + 10000 c03
kingbase$# , (random() * 400)::int + 10000 c04
kingbase$# , (random() * 500)::int + 10000 c05
kingbase$# , (random() * 600)::int + 10000 c06
kingbase$# , (random() * 700)::int + 10000 c07
kingbase$# , (random() * 800)::int + 10000 c08
kingbase$# , (random() * 900)::int + 10000 c09
kingbase$# , (random() * 1000)::int + 10000 c10
kingbase$# , (random() * 2000)::int + 10000 c11
kingbase$# , (random() * 3000)::int + 10000 c12
kingbase$# , (random() * 4000)::int + 10000 c13
kingbase$# , (random() * 5000)::int + 10000 c14
kingbase$# , (random() * 6000)::int + 10000 c15
kingbase$# , (random() * 7000)::int + 10000 c16
kingbase$# , (random() * 8000)::int + 10000 c17
kingbase$# , (random() * 9000)::int + 10000 c18
kingbase$# , (random() * 10000)::int + 10000 c19
kingbase$# , (random() * 20000)::int + 10000 c20
kingbase$# , (random() * 30000)::int + 10000 c21
kingbase$# , (random() * 40000)::int + 10000 c22
kingbase$# , (random() * 50000)::int + 10000 c23
kingbase$# , (random() * 60000)::int + 10000 c24
kingbase$# , (random() * 70000)::int + 10000 c25
kingbase$# , (random() * 80000)::int + 10000 c26
kingbase$# , (random() * 90000)::int + 10000 c27
kingbase$# , (random() * 10000)::int + 10000 c28
kingbase$# , (random() * 10000)::int + 10000 c29
kingbase$# , md5(random()::text) t01
kingbase$# , md5(random()::text) t02
kingbase$# , md5(random()::text) t03
kingbase$# , md5(random()::text) t04
kingbase$# , md5(random()::text) t05
kingbase$# , md5(random()::text) t06
kingbase$# , md5(random()::text) t07
kingbase$# , md5(random()::text) t08
kingbase$# , md5(random()::text) t09
kingbase$# , md5(random()::text) t10
kingbase$# , md5(random()::text) t11
kingbase$# , md5(random()::text) t12
kingbase$# , md5(random()::text) t13
kingbase$# , md5(random()::text) t14
kingbase$# , md5(random()::text) t15
kingbase$# , md5(random()::text) t16
kingbase$# , md5(random()::text) t17
kingbase$# , md5(random()::text) t18
kingbase$# , md5(random()::text) t19
kingbase$# , md5(random()::text) t20
kingbase$# from generate_series(1, 2000000) id;
kingbase$#
kingbase$# update pg_index
kingbase$# set indislive= true
kingbase$# where indrelid = 'bigtab'::regclass;
kingbase$#
kingbase$# analyse bigtab;
kingbase$# reindex table bigtab;
kingbase$#
kingbase$# end;
kingbase$# $$;
ANONYMOUS BLOCK
Time: 87110.126 ms (01:27.110)
优点: 批量维护索引,用时短;语句固定模式;自动维护索引;支持之后的索引。
缺点: 多个SQL语句,不易嵌入语句块。
最后的话
reindex table 的执行依赖统计信息,所以需要执行 analyse table ,才能成功重建表的全部可更新的索引。
reindex index 不受上述因素的影响,可以强制重建不更新的索引,并自动修改 indislive= true。
如果在REINDEX期间出现异常,那么所有需要rebuild的索引的状态都是invalid,意味着这些索引仍然占用空间,定义仍在但不能使用。
避免REINDEX期间出现异常,可以在索引更新操作时,跳过唯一索引和外键依赖索引等。
KingbaseES例程之拥有大量索引的表导入数据的更多相关文章
- U8API——向U8数据库表导入数据
一.打开API资源管理器 替换两个引用 打开应用实例,选择相应的功能 复制相应的封装类到自己的目录下 在数据库新建临时表,与目标表相同 数据导入: 思路:先将要导入的数据导入到与U8目标表相同的临时表 ...
- mysql单表导入数据,全量备份导入单表
(1)“导出”表 导出表是在备份的prepare阶段进行的,因此,一旦完全备份完成,就可以在prepare过程中通过--export选项将某表导出了: innobackupex --apply-log ...
- asp.net 从Excel表导入数据到数据库中
http://www.cnblogs.com/hfzsjz/archive/2010/12/31/1922901.html http://hi.baidu.com/ctguyg/item/ebc857 ...
- 关于mysql 表导入数据
一.实验准备: 1.实验设备:Dell laptop 7559; 2.实验环境:windows 10操作系统; 3.数据库版本:mysql 8.0; 二.实验目的: 1.将一个宠物表pet.txt文件 ...
- oracle RAC 11g sqlload 生产表导入数据(ORA-12899)
背景:由于即将来临的双十一,业务部门(我司是做京东,天猫的短信服务),短信入库慢,需要DBA把数据库sqlload进数据库. 表结构如下: MRS VARCHAR2(100), STATUS VARC ...
- 从Excel表导入数据到Table
步骤: 1.写第一行SQL,(本sql对应的是oracle数据库) ="INSERT INTO TD_PROMOTION_RATE VALUES("&A3&&quo ...
- hive 建表导入数据
1. hive> create table wyp > (id int, name string, > age int, tel string) > ROW FORMAT DE ...
- Hive创建表|数据的导入|数据导出的几种方式
* Hive创建表的三种方式 1.使用create命令创建一个新表 例如:create table if not exists db_web_data.track_log(字段) partitione ...
- SQL Server 索引和表体系结构(聚集索引)
聚集索引 概述 关于索引和表体系结构的概念一直都是讨论比较多的话题,其中表的各种存储形式是讨论的重点,在各个网站上面也有很多关于这方面写的不错的文章,我写这篇文章的目的也是为了将所有的知识点尽可能的组 ...
随机推荐
- 深入解析kubernetes controller-runtime
Overview controller-runtime 是 Kubernetes 社区提供可供快速搭建一套 实现了controller 功能的工具,无需自行实现Controller的功能了:在 Kub ...
- 重磅硬核 | 一文聊透对象在 JVM 中的内存布局,以及内存对齐和压缩指针的原理及应用
欢迎关注公众号:bin的技术小屋 大家好,我是bin,又到了每周我们见面的时刻了,我的公众号在1月10号那天发布了第一篇文章<从内核角度看IO模型的演变>,在这篇文章中我们通过图解的方式以 ...
- Mark IntelliJ IDEA 2018.2.3破解
来源:https://blog.csdn.net/qq_38060935/article/details/90377761
- loguru备忘
loguru是个非常好用的三方日志管理包,简单易用,奈何老是记不住,在这记录一下吧 #coding:utf-8 ''' @version: python3.8 @author: 'eric' @lic ...
- 000Java_Java_历史
1. Java历史 程序:有序指令的集合 1995年--Java.1版本 Java之父--Gosling Java特点 面向对象 健壮 (强类型机制异常处理垃圾的自动回收) 跨平台性[一个编译好的.c ...
- 【RocketMQ】消息的消费
上一讲[RocketMQ]消息的拉取 消息消费 当RocketMQ进行消息消费的时候,是通过ConsumeMessageConcurrentlyService的submitConsumeRequest ...
- HashTable集合和练习题_计算一个字符串中每一个字符出现的次数
HashTable集合 /** * java.util.Hashtable<K,V>集合 implement Map<K,V>接口 * Hashtable:底层也是一个哈希表, ...
- vue 数据更新了但视图没改变?试试 $set
场景 编辑表格中某行数据时,需要把它赋值给对话框表单 this.form,如果直接用 = 赋值,会导致:表单的输入框内容无法二次编辑. 使用 Vue-dev-tool 的 Components 功能测 ...
- 人工智能不过尔尔,基于Python3深度学习库Keras/TensorFlow打造属于自己的聊天机器人(ChatRobot)
原文转载自「刘悦的技术博客」https://v3u.cn/a_id_178 聊天机器人(ChatRobot)的概念我们并不陌生,也许你曾经在百无聊赖之下和Siri打情骂俏过,亦或是闲暇之余与小爱同学谈 ...
- GreatSQL特性介绍及未来展望--叶金荣|万里数据库
「3306π」是由业内知名MySQL专家叶金荣.吴炳锡首发倡议成立,围绕MySQL及云数据库.大数据等周边相关技术的技术爱好者的社区.致力于把互联网技术带到传统行业里,推动开源技术在传统行业中应用.本 ...