将PostgreSQL数据库的表导入到elasticsearch中

1.查看PostgreSQL表结构和数据信息

edbstore=# \d customers

                                          Table "edbstore.customers"

        Column        |         Type          |                           Modifiers

----------------------+-----------------------+----------------------------------------------------------------

 customerid           | integer               | not null default nextval('customers_customerid_seq'::regclass)

 firstname            | character varying(50) | not null

 lastname             | character varying(50) | not null

 address1             | character varying(50) | not null

 address2             | character varying(50) |

 city                 | character varying(50) | not null

 state                | character varying(50) |

 zip                  | integer               |

 country              | character varying(50) | not null

 region               | smallint              | not null

 email                | character varying(50) |

 phone                | character varying(50) |

 creditcardtype       | integer               | not null

 creditcard           | character varying(50) | not null

 creditcardexpiration | character varying(50) | not null

 username             | character varying(50) | not null

 password             | character varying(50) | not null

 age                  | smallint              |

 income               | integer               |

 gender               | character varying(1)  |

Indexes:

    "customers_pkey" PRIMARY KEY, btree (customerid)

    "ix_cust_username" UNIQUE, btree (username)

Referenced by:

    TABLE "cust_hist" CONSTRAINT "fk_cust_hist_customerid" FOREIGN KEY (customerid) REFERENCES customers(customerid) ON DELETE CASCADE

    TABLE "orders" CONSTRAINT "fk_customerid" FOREIGN KEY (customerid) REFERENCES customers(customerid) ON DELETE SET NULL

edbstore=# select count(1) from customers;

 count

-------

 20000

(1 row)

2.利用PostgreSQL的row_to_json函数将表结构导出并保存为json格式

edbstore=# \t

Tuples only is on.

edbstore=# \o customer.json

edbstore=# select row_to_json(r) from customers as r;

edbstore=# \q

[postgres@sht-sgmhadoopcm-01 dba]$ ls -lh customer.json

-rw-r--r-- 1 postgres appuser 7.7M Dec  7 22:37 customer.json

$ head -1 customer.json

 {"customerid":1,"firstname":"VKUUXF","lastname":"ITHOMQJNYX","address1":"4608499546 Dell Way","address2":null,"city":"QSDPAGD","state":"SD","zip":24101,"country":"US","region":1,"email":"ITHOMQJNYX@dell.com","phone":"4608499546","creditcardtype":1,"creditcard":"1979279217775911","creditcardexpiration":"2012/03","username":"user1","password":"password","age":55,"income":100000,"gender":"M"}

此时customer表虽然转储为json格式文件，但是并不能直接导入到elasticsearch，否则会报错如下

$ curl -H "Content-Type: application/json" -XPOST "172.16.101.55:9200/bank/_bulk?pretty&refresh" --data-binary "@customer.json"

{

  "error" : {

    "root_cause" : [

      {

        "type" : "illegal_argument_exception",

        "reason" : "Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_NUMBER]"

      }

    ],

    "type" : "illegal_argument_exception",

    "reason" : "Malformed action/metadata line [], expected START_OBJECT or END_OBJECT but found [VALUE_NUMBER]"

  },

  "status" : 400

}

根据文档https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html说明，我们的json数据里并未指定每行数据唯一的文档id值

3.为json格式的表数据添加id字段

因为之前我们看到该customer表共有2000行，所以我们需要生成对应的20000个id值，我们借助python实现，新建build_id.py文件，并写入如下内容，看清楚是20001，因为包头不包尾原则，1-20000实际打印出来是1-19999，所以我们写1-20001

for i in range(1,20001):

    print('{"index":{"_id":"%s"}}' %i )

为该文件添加可执行权限，然后执行即可

$ python build_id.py > build_id.txt

$ head -3 build_id.txt

{"index":{"_id":""}}

{"index":{"_id":""}}

{"index":{"_id":""}}

利用linux “paste"命令，将id文件和表文件合并

$ paste -d'\n' build_id.txt customer.json > customer_new.json

$ head - customer_new.json

{"index":{"_id":""}}

 {"customerid":,"firstname":"VKUUXF","lastname":"ITHOMQJNYX","address1":"4608499546 Dell Way","address2":null,"city":"QSDPAGD","state":"SD","zip":,"country":"US","region":,"email":"ITHOMQJNYX@dell.com","phone":"","creditcardtype":,"creditcard":"","creditcardexpiration":"2012/03","username":"user1","password":"password","age":,"income":,"gender":"M"}

{"index":{"_id":""}}

 {"customerid":,"firstname":"HQNMZH","lastname":"UNUKXHJVXB","address1":"5119315633 Dell Way","address2":null,"city":"YNCERXJ","state":"AZ","zip":,"country":"US","region":,"email":"UNUKXHJVXB@dell.com","phone":"","creditcardtype":,"creditcard":"","creditcardexpiration":"2012/11","username":"user2","password":"password","age":,"income":,"gender":"M"}

4.此时处理过的json格式的表文件就可以正常导入到elasticsearch中了，测试

$ curl -H "Content-Type: application/json" -XPOST "172.16.101.55:9200/customer/_bulk?pretty&refresh" --data-binary "@customer_new.json"

$ curl http://172.16.101.55:9200/_cat/indices?v

health status index    uuid                   pri rep docs.count docs.deleted store.size pri.store.size

yellow open   customer DvLoM7NjSYyjTwD5BSkK3A                               10mb           10mb

将PostgreSQL数据库的表导入到elasticsearch中的更多相关文章

使用Logstash把MySQL数据导入到Elasticsearch中
总结:这种适合把已有的MySQL数据导入到Elasticsearch中有一个csv文件,把里面的数据通过Navicat Premium 软件导入到数据表中,共有998条数据文件下载地址:https ...
oracle 表导入到powerDesigner 中
最近不忙,之前一直是用powerDesigner看表结构,还没自己导入过,今天试试 oracle 表导入到powerDesigner 中步骤: 1.File--->reverse Enginne ...
Logstash：把MySQL数据导入到Elasticsearch中
Logstash:把MySQL数据导入到Elasticsearch中前提条件需要安装好Elasticsearch及Kibana. MySQL安装根据不同的操作系统我们分别对MySQL进行安装.我 ...
sqoop将关系型的数据库得数据导入到hbase中
1.sqoop将关系数据库导入到hbase的参数说明
Logstash：解析 JSON 文件并导入到 Elasticsearch 中
转载自:https://elasticstack.blog.csdn.net/article/details/114383426 在今天的文章中,我们将详述如何使用 Logstash 来解析 JSON ...
sqoop将关系型数据库的表导入hive中
1.sqoop 将关系型数据库的数据导入hive的参数说明:
pg_dumpall - 抽出一个 PostgreSQL 数据库集群到脚本文件中
SYNOPSIS pg_dumpall [ option...] DESCRIPTION 描述 pg_dumpall 是一个用于写出("转储")一个数据库集群里的所有 Postgr ...
将Mongodb的表导入到Hive中
1.官方文档:https://docs.mongodb.com/ecosystem/tools/hadoop/ 2.Hive介绍: Hive特点: 1.hive是一个数据仓库,和oracle,mysq ...
如何将数据库中的表导入到PowerDesigner中
1. 打开PowerDesigner12,在菜单中按照如下方式进行操作file->Reverse Engineer->DataBase 点击后,弹出 New Physical ...

随机推荐

BZOJ1968: [Ahoi2005]COMMON 约数研究线性筛
按照积性函数的定义筛一下这个积性函数即可. #include <cstdio> #include <algorithm> #define N 1000004 #define s ...
MySQL-5.6.13解压版（zip版）安装配置教程
来源:http://www.splaybow.com/post/mysql-5-6-13-zip-install.html [下载MySQL 5.6.13] 从MySQL官方网站mysql.com找到 ...
noi.ac NOI挑战营模拟赛1-5
注:因为博主是个每次考试都爆零垫底的菜鸡,所以此篇博客很有可能咕咕咕 (指只贴AC代码不写题解的......如果我真的不会做的话,就不能怪我了qwqwq) Day1 T1 swap 23pts 从一个 ...
玩转git和github
1.概念 git---工具,版本控制 github----网站,社交平台,开源项目,远程仓库 2.下载 msysgit是Windows版的Git,从http://msysgit.github.io/下 ...
[CSP-S模拟测试]:A（数学）
题目传送门(内部题44) 输入格式一行四个整数,分别表示$S,T,a,b$. 输出格式输出最小步数,数据保证有解. 样例样例输入: 10 28 4 2 样例输出: 数据范围与提示样例解释: 先 ...
[CSP-S模拟测试]:循环依赖（拓扑）
题目传送门(内部题148) 输入格式每个测试点第一行为一个正整数$T$,表示该测试点内的数据组数. 接下来$T$组数据,每组数据第一行一个正整数$n$,表示有引用单元格进行计算的单元格数,接下来$n ...
Vuex的基本原理与使用
我们需要知道 vue 是单向数据流的方式驱动的什么是vuex? 为什么要使用vuex ? - 多个视图依赖于同一状态. - 来自不同视图的行为需要变更同一状态. vuex 类似Redux 的状态管理 ...
使用Telnet访问端口发送数据
什么是Telnet? 对于Telnet的认识,不同的人持有不同的观点,可以把Telnet当成一种通信协议,但是对于入侵者而言,Telnet只是一种远程登录的工具.一旦入侵者与远程主机建立了Telnet ...
2018-2019-2 20165215《网络对抗技术》Exp7 网络欺诈防范
目录实验目的实验内容实验步骤 (一)简单应用SET工具建立冒名网站 (二)ettercap DNS spoof (三)结合应用两种技术,用DNS spoof引导特定访问到冒名网站基础问题回答 ...
修改mp3图片和信息——BesMp3Editor
导读 BesMp3Editor, 是一款小巧的 MP3 编辑工具,可以修改.添加 MP3 上的图片.歌曲名.歌手.专辑信息. 最近想给 BesLyric-for-X 添加一个功能,为下载下来的歌曲添加 ...