概述

如何快速插入大量数据比如几千万上亿的带索引的数据表。

数据准备

准备一个拥有二十个索引的数据表。


kingbase=# \d+ bigtab
Table "kingbase.bigtab"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
c01 | integer | | | | plain | |
c02 | integer | | | | plain | |
c03 | integer | | | | plain | |
c04 | integer | | | | plain | |
c05 | integer | | | | plain | |
c06 | integer | | | | plain | |
c07 | integer | | | | plain | |
c08 | integer | | | | plain | |
c09 | integer | | | | plain | |
c10 | integer | | | | plain | |
c11 | integer | | | | plain | |
c12 | integer | | | | plain | |
c13 | integer | | | | plain | |
c14 | integer | | | | plain | |
c15 | integer | | | | plain | |
c16 | integer | | | | plain | |
c17 | integer | | | | plain | |
c18 | integer | | | | plain | |
c19 | integer | | | | plain | |
c20 | integer | | | | plain | |
c21 | integer | | | | plain | |
c22 | integer | | | | plain | |
c23 | integer | | | | plain | |
c24 | integer | | | | plain | |
c25 | integer | | | | plain | |
c26 | integer | | | | plain | |
c27 | integer | | | | plain | |
c28 | integer | | | | plain | |
c29 | integer | | | | plain | |
t01 | text | | | | extended | |
t02 | text | | | | extended | |
t03 | text | | | | extended | |
t04 | text | | | | extended | |
t05 | text | | | | extended | |
t06 | text | | | | extended | |
t07 | text | | | | extended | |
t08 | text | | | | extended | |
t09 | text | | | | extended | |
t10 | text | | | | extended | |
t11 | text | | | | extended | |
t12 | text | | | | extended | |
t13 | text | | | | extended | |
t14 | text | | | | extended | |
t15 | text | | | | extended | |
t16 | text | | | | extended | |
t17 | text | | | | extended | |
t18 | text | | | | extended | |
t19 | text | | | | extended | |
t20 | text | | | | extended | |
Indexes:
"bigtab_i01" btree (c01)
"bigtab_i02" btree (c02)
"bigtab_i03" btree (c03)
"bigtab_i04" btree (c04)
"bigtab_i05" btree (c05)
"bigtab_i06" btree (c06)
"bigtab_i07" btree (c07)
"bigtab_i08" btree (c08)
"bigtab_i09" btree (c09)
"bigtab_i10" btree (c10)
"bigtab_i11" btree (c11)
"bigtab_i12" btree (c12)
"bigtab_i13" btree (c13)
"bigtab_i14" btree (c14)
"bigtab_i15" btree (c15)
"bigtab_i16" btree (c16)
"bigtab_i17" btree (c17)
"bigtab_i18" btree (c18)
"bigtab_i19" btree (c19)
"bigtab_i20" btree (c20)
Access method: heap kingbase=#

方法一:直接插入海量数据,自动维护索引



kingbase=#
kingbase=# insert into bigtab
kingbase-# select id
kingbase-# , (random() * 100)::int + 1000 c01
kingbase-# , (random() * 200)::int + 1000 c02
kingbase-# , (random() * 300)::int + 10000 c03
kingbase-# , (random() * 400)::int + 10000 c04
kingbase-# , (random() * 500)::int + 10000 c05
kingbase-# , (random() * 600)::int + 10000 c06
kingbase-# , (random() * 700)::int + 10000 c07
kingbase-# , (random() * 800)::int + 10000 c08
kingbase-# , (random() * 900)::int + 10000 c09
kingbase-# , (random() * 1000)::int + 10000 c10
kingbase-# , (random() * 2000)::int + 10000 c11
kingbase-# , (random() * 3000)::int + 10000 c12
kingbase-# , (random() * 4000)::int + 10000 c13
kingbase-# , (random() * 5000)::int + 10000 c14
kingbase-# , (random() * 6000)::int + 10000 c15
kingbase-# , (random() * 7000)::int + 10000 c16
kingbase-# , (random() * 8000)::int + 10000 c17
kingbase-# , (random() * 9000)::int + 10000 c18
kingbase-# , (random() * 10000)::int + 10000 c19
kingbase-# , (random() * 20000)::int + 10000 c20
kingbase-# , (random() * 30000)::int + 10000 c21
kingbase-# , (random() * 40000)::int + 10000 c22
kingbase-# , (random() * 50000)::int + 10000 c23
kingbase-# , (random() * 60000)::int + 10000 c24
kingbase-# , (random() * 70000)::int + 10000 c25
kingbase-# , (random() * 80000)::int + 10000 c26
kingbase-# , (random() * 90000)::int + 10000 c27
kingbase-# , (random() * 10000)::int + 10000 c28
kingbase-# , (random() * 10000)::int + 10000 c29
kingbase-# , md5(random()::text) t01
kingbase-# , md5(random()::text) t02
kingbase-# , md5(random()::text) t03
kingbase-# , md5(random()::text) t04
kingbase-# , md5(random()::text) t05
kingbase-# , md5(random()::text) t06
kingbase-# , md5(random()::text) t07
kingbase-# , md5(random()::text) t08
kingbase-# , md5(random()::text) t09
kingbase-# , md5(random()::text) t10
kingbase-# , md5(random()::text) t11
kingbase-# , md5(random()::text) t12
kingbase-# , md5(random()::text) t13
kingbase-# , md5(random()::text) t14
kingbase-# , md5(random()::text) t15
kingbase-# , md5(random()::text) t16
kingbase-# , md5(random()::text) t17
kingbase-# , md5(random()::text) t18
kingbase-# , md5(random()::text) t19
kingbase-# , md5(random()::text) t20
kingbase-# from generate_series(1, 2000000) id;
INSERT 0 2000000
Time: 299331.143 ms (04:59.331)

优点: 语句单一;自动维护索引;自动支持之后的索引。

缺点: 逐行维护索引,造成用时较长。

方法二:删除索引,插入海量数据,再创建索引


kingbase=#
kingbase=# do
kingbase-# $$
kingbase$# begin
kingbase$# drop index bigtab_i01;
kingbase$# drop index bigtab_i02;
kingbase$# drop index bigtab_i03;
kingbase$# drop index bigtab_i04;
kingbase$# drop index bigtab_i05;
kingbase$# drop index bigtab_i06;
kingbase$# drop index bigtab_i07;
kingbase$# drop index bigtab_i08;
kingbase$# drop index bigtab_i09;
kingbase$# drop index bigtab_i10;
kingbase$# drop index bigtab_i11;
kingbase$# drop index bigtab_i12;
kingbase$# drop index bigtab_i13;
kingbase$# drop index bigtab_i14;
kingbase$# drop index bigtab_i15;
kingbase$# drop index bigtab_i16;
kingbase$# drop index bigtab_i17;
kingbase$# drop index bigtab_i18;
kingbase$# drop index bigtab_i19;
kingbase$# drop index bigtab_i20;
kingbase$#
kingbase$# insert into bigtab
kingbase$# select id
kingbase$# , (random() * 100)::int + 1000 c01
kingbase$# , (random() * 200)::int + 1000 c02
kingbase$# , (random() * 300)::int + 10000 c03
kingbase$# , (random() * 400)::int + 10000 c04
kingbase$# , (random() * 500)::int + 10000 c05
kingbase$# , (random() * 600)::int + 10000 c06
kingbase$# , (random() * 700)::int + 10000 c07
kingbase$# , (random() * 800)::int + 10000 c08
kingbase$# , (random() * 900)::int + 10000 c09
kingbase$# , (random() * 1000)::int + 10000 c10
kingbase$# , (random() * 2000)::int + 10000 c11
kingbase$# , (random() * 3000)::int + 10000 c12
kingbase$# , (random() * 4000)::int + 10000 c13
kingbase$# , (random() * 5000)::int + 10000 c14
kingbase$# , (random() * 6000)::int + 10000 c15
kingbase$# , (random() * 7000)::int + 10000 c16
kingbase$# , (random() * 8000)::int + 10000 c17
kingbase$# , (random() * 9000)::int + 10000 c18
kingbase$# , (random() * 10000)::int + 10000 c19
kingbase$# , (random() * 20000)::int + 10000 c20
kingbase$# , (random() * 30000)::int + 10000 c21
kingbase$# , (random() * 40000)::int + 10000 c22
kingbase$# , (random() * 50000)::int + 10000 c23
kingbase$# , (random() * 60000)::int + 10000 c24
kingbase$# , (random() * 70000)::int + 10000 c25
kingbase$# , (random() * 80000)::int + 10000 c26
kingbase$# , (random() * 90000)::int + 10000 c27
kingbase$# , (random() * 10000)::int + 10000 c28
kingbase$# , (random() * 10000)::int + 10000 c29
kingbase$# , md5(random()::text) t01
kingbase$# , md5(random()::text) t02
kingbase$# , md5(random()::text) t03
kingbase$# , md5(random()::text) t04
kingbase$# , md5(random()::text) t05
kingbase$# , md5(random()::text) t06
kingbase$# , md5(random()::text) t07
kingbase$# , md5(random()::text) t08
kingbase$# , md5(random()::text) t09
kingbase$# , md5(random()::text) t10
kingbase$# , md5(random()::text) t11
kingbase$# , md5(random()::text) t12
kingbase$# , md5(random()::text) t13
kingbase$# , md5(random()::text) t14
kingbase$# , md5(random()::text) t15
kingbase$# , md5(random()::text) t16
kingbase$# , md5(random()::text) t17
kingbase$# , md5(random()::text) t18
kingbase$# , md5(random()::text) t19
kingbase$# , md5(random()::text) t20
kingbase$# from generate_series(1, 2000000) id;
kingbase$#
kingbase$# create index bigtab_i01 on bigtab (c01);
kingbase$# create index bigtab_i02 on bigtab (c02);
kingbase$# create index bigtab_i03 on bigtab (c03);
kingbase$# create index bigtab_i04 on bigtab (c04);
kingbase$# create index bigtab_i05 on bigtab (c05);
kingbase$# create index bigtab_i06 on bigtab (c06);
kingbase$# create index bigtab_i07 on bigtab (c07);
kingbase$# create index bigtab_i08 on bigtab (c08);
kingbase$# create index bigtab_i09 on bigtab (c09);
kingbase$# create index bigtab_i10 on bigtab (c10);
kingbase$# create index bigtab_i11 on bigtab (c11);
kingbase$# create index bigtab_i12 on bigtab (c12);
kingbase$# create index bigtab_i13 on bigtab (c13);
kingbase$# create index bigtab_i14 on bigtab (c14);
kingbase$# create index bigtab_i15 on bigtab (c15);
kingbase$# create index bigtab_i16 on bigtab (c16);
kingbase$# create index bigtab_i17 on bigtab (c17);
kingbase$# create index bigtab_i18 on bigtab (c18);
kingbase$# create index bigtab_i19 on bigtab (c19);
kingbase$# create index bigtab_i20 on bigtab (c20);
kingbase$#
kingbase$# end;
kingbase$# $$;
ANONYMOUS BLOCK
Time: 83069.170 ms (01:23.069)

优点: 批量维护索引,用时最短。

缺点: 语句复杂且固化;手动维护删建索引语句;不支持之后的索引。

方法三:禁止索引更改,插入海量数据,重建表的全部索引


kingbase=# do
kingbase-# $$
kingbase$# begin
kingbase$#
kingbase$# update pg_index
kingbase$# set indislive= false
kingbase$# where indrelid = 'bigtab'::regclass;
kingbase$#
kingbase$# insert into bigtab
kingbase$# select id
kingbase$# , (random() * 100)::int + 1000 c01
kingbase$# , (random() * 200)::int + 1000 c02
kingbase$# , (random() * 300)::int + 10000 c03
kingbase$# , (random() * 400)::int + 10000 c04
kingbase$# , (random() * 500)::int + 10000 c05
kingbase$# , (random() * 600)::int + 10000 c06
kingbase$# , (random() * 700)::int + 10000 c07
kingbase$# , (random() * 800)::int + 10000 c08
kingbase$# , (random() * 900)::int + 10000 c09
kingbase$# , (random() * 1000)::int + 10000 c10
kingbase$# , (random() * 2000)::int + 10000 c11
kingbase$# , (random() * 3000)::int + 10000 c12
kingbase$# , (random() * 4000)::int + 10000 c13
kingbase$# , (random() * 5000)::int + 10000 c14
kingbase$# , (random() * 6000)::int + 10000 c15
kingbase$# , (random() * 7000)::int + 10000 c16
kingbase$# , (random() * 8000)::int + 10000 c17
kingbase$# , (random() * 9000)::int + 10000 c18
kingbase$# , (random() * 10000)::int + 10000 c19
kingbase$# , (random() * 20000)::int + 10000 c20
kingbase$# , (random() * 30000)::int + 10000 c21
kingbase$# , (random() * 40000)::int + 10000 c22
kingbase$# , (random() * 50000)::int + 10000 c23
kingbase$# , (random() * 60000)::int + 10000 c24
kingbase$# , (random() * 70000)::int + 10000 c25
kingbase$# , (random() * 80000)::int + 10000 c26
kingbase$# , (random() * 90000)::int + 10000 c27
kingbase$# , (random() * 10000)::int + 10000 c28
kingbase$# , (random() * 10000)::int + 10000 c29
kingbase$# , md5(random()::text) t01
kingbase$# , md5(random()::text) t02
kingbase$# , md5(random()::text) t03
kingbase$# , md5(random()::text) t04
kingbase$# , md5(random()::text) t05
kingbase$# , md5(random()::text) t06
kingbase$# , md5(random()::text) t07
kingbase$# , md5(random()::text) t08
kingbase$# , md5(random()::text) t09
kingbase$# , md5(random()::text) t10
kingbase$# , md5(random()::text) t11
kingbase$# , md5(random()::text) t12
kingbase$# , md5(random()::text) t13
kingbase$# , md5(random()::text) t14
kingbase$# , md5(random()::text) t15
kingbase$# , md5(random()::text) t16
kingbase$# , md5(random()::text) t17
kingbase$# , md5(random()::text) t18
kingbase$# , md5(random()::text) t19
kingbase$# , md5(random()::text) t20
kingbase$# from generate_series(1, 2000000) id;
kingbase$#
kingbase$# update pg_index
kingbase$# set indislive= true
kingbase$# where indrelid = 'bigtab'::regclass;
kingbase$#
kingbase$# analyse bigtab;
kingbase$# reindex table bigtab;
kingbase$#
kingbase$# end;
kingbase$# $$;
ANONYMOUS BLOCK
Time: 87110.126 ms (01:27.110)

优点: 批量维护索引,用时短;语句固定模式;自动维护索引;支持之后的索引。

缺点: 多个SQL语句,不易嵌入语句块。

最后的话

reindex table 的执行依赖统计信息,所以需要执行 analyse table ,才能成功重建表的全部可更新的索引。

reindex index 不受上述因素的影响,可以强制重建不更新的索引,并自动修改 indislive= true。

如果在REINDEX期间出现异常,那么所有需要rebuild的索引的状态都是invalid,意味着这些索引仍然占用空间,定义仍在但不能使用。

避免REINDEX期间出现异常,可以在索引更新操作时,跳过唯一索引和外键依赖索引等。

KingbaseES例程之拥有大量索引的表导入数据的更多相关文章

  1. U8API——向U8数据库表导入数据

    一.打开API资源管理器 替换两个引用 打开应用实例,选择相应的功能 复制相应的封装类到自己的目录下 在数据库新建临时表,与目标表相同 数据导入: 思路:先将要导入的数据导入到与U8目标表相同的临时表 ...

  2. mysql单表导入数据,全量备份导入单表

    (1)“导出”表 导出表是在备份的prepare阶段进行的,因此,一旦完全备份完成,就可以在prepare过程中通过--export选项将某表导出了: innobackupex --apply-log ...

  3. asp.net 从Excel表导入数据到数据库中

    http://www.cnblogs.com/hfzsjz/archive/2010/12/31/1922901.html http://hi.baidu.com/ctguyg/item/ebc857 ...

  4. 关于mysql 表导入数据

    一.实验准备: 1.实验设备:Dell laptop 7559; 2.实验环境:windows 10操作系统; 3.数据库版本:mysql 8.0; 二.实验目的: 1.将一个宠物表pet.txt文件 ...

  5. oracle RAC 11g sqlload 生产表导入数据(ORA-12899)

    背景:由于即将来临的双十一,业务部门(我司是做京东,天猫的短信服务),短信入库慢,需要DBA把数据库sqlload进数据库. 表结构如下: MRS VARCHAR2(100), STATUS VARC ...

  6. 从Excel表导入数据到Table

    步骤: 1.写第一行SQL,(本sql对应的是oracle数据库) ="INSERT INTO TD_PROMOTION_RATE VALUES("&A3&&quo ...

  7. hive 建表导入数据

    1. hive> create table wyp > (id int, name string, > age int, tel string) > ROW FORMAT DE ...

  8. Hive创建表|数据的导入|数据导出的几种方式

    * Hive创建表的三种方式 1.使用create命令创建一个新表 例如:create table if not exists db_web_data.track_log(字段) partitione ...

  9. SQL Server 索引和表体系结构(聚集索引)

    聚集索引 概述 关于索引和表体系结构的概念一直都是讨论比较多的话题,其中表的各种存储形式是讨论的重点,在各个网站上面也有很多关于这方面写的不错的文章,我写这篇文章的目的也是为了将所有的知识点尽可能的组 ...

随机推荐

  1. easyexcel注解

    1.@ExcelProperty 必要的一个注解,注解中有三个参数value,index分别代表列明,列序号 1.value 通过标题文本对应2.index 通过文本行号对应 2.@ColumnWit ...

  2. SpringBoot 开发案例之整合FastDFS分布式文件系统

    1.pom依赖 <!--fastdfs--> <dependency> <groupId>com.github.tobato</groupId> < ...

  3. 发评测赢好礼 | Serverless 函数计算征集令

    随着云计算发展,云原生热度攀升,Serverless 架构崭露头角且发展势头迅猛.不仅被更多开发者所关注,市场占有率也逐年提高.阿里云函数计算(Function Compute)是一个事件驱动的全托管 ...

  4. Object类和Dome的新媒体类型

    Object类 所有的类都是继承自Object的 Java Object 类是所有类的父类,也就是说 Java 的所有类都继承了 Object,子类可以使用 Object 的所有方法 Object 类 ...

  5. 关于 用fscanf读文件,把文件中用##分割的内容分开

    今天呀,被学弟问了一个问题 文件里存的是"123##456##0##1644444.....##" 为什么用fscanf(fp, "%s##%s......", ...

  6. (一)java基础篇---第一个程序

    先认识java的基础知识 1.变量命名规则 :1)变量名由数字字母下划线组成,2)不能使用java的关键字,比如public这种,3)遵循小驼峰命名法 2.数据类型 2.1基本数据类型有8种 其中分为 ...

  7. Note -「0/1 Fractional Programming」

    What is that? Let us pay attention to a common problem that we often meet in daily life: There are \ ...

  8. DDS信号发生器加强版(双通道,发送波形的频率可控,相位可控,种类可控)

    目的:设计一个DDS,可以输出两个波形,输出的波形的周期可以修改,相位可以修改,种类也可以修改 输入:clk,reset,一个控制T的按键,一个控制相位的按键,一个控制波形种类的按键. 思路:双通道- ...

  9. led闪烁(时序输入输出,自定义变量,时钟仿真,执行顺序)

    1.设计定义 设计一个以200ms亮,200ms暗交替闪烁的led灯,并且有一个复位按钮可以停止工作. 2.设计输入 2.1端口 以固定周期交替闪烁说明由时钟控制,需要一个时钟控制端口clk,要求复位 ...

  10. Linux一些错误总结

    1.cannot verify <mydomainname> certificate, issued by '/C=US/O=Let's Encrypt/CN=R3': 解决1:wget ...