1.查看PostgreSQL表结构和数据信息

edbstore=# \d customers
Table "edbstore.customers"
Column | Type | Modifiers
----------------------+-----------------------+----------------------------------------------------------------
customerid | integer | not null default nextval('customers_customerid_seq'::regclass)
firstname | character varying(50) | not null
lastname | character varying(50) | not null
address1 | character varying(50) | not null
address2 | character varying(50) |
city | character varying(50) | not null
state | character varying(50) |
zip | integer |
country | character varying(50) | not null
region | smallint | not null
email | character varying(50) |
phone | character varying(50) |
creditcardtype | integer | not null
creditcard | character varying(50) | not null
creditcardexpiration | character varying(50) | not null
username | character varying(50) | not null
password | character varying(50) | not null
age | smallint |
income | integer |
gender | character varying(1) |
Indexes:
"customers_pkey" PRIMARY KEY, btree (customerid)
"ix_cust_username" UNIQUE, btree (username)
Referenced by:
TABLE "cust_hist" CONSTRAINT "fk_cust_hist_customerid" FOREIGN KEY (customerid) REFERENCES customers(customerid) ON DELETE CASCADE
TABLE "orders" CONSTRAINT "fk_customerid" FOREIGN KEY (customerid) REFERENCES customers(customerid) ON DELETE SET NULL edbstore=# select count(1) from customers;
count
-------
20000
(1 row)

2.利用PostgreSQL的row_to_json函数将表结构导出并保存为json格式

edbstore=# \t
Tuples only is on.
edbstore=# \o customer.json
edbstore=# select row_to_json(r) from customers as r;
edbstore=# \q [postgres@sht-sgmhadoopcm-01 dba]$ ls -lh customer.json
-rw-r--r-- 1 postgres appuser 7.7M Dec 7 22:37 customer.json $ head -1 customer.json
{"customerid":1,"firstname":"VKUUXF","lastname":"ITHOMQJNYX","address1":"4608499546 Dell Way","address2":null,"city":"QSDPAGD","state":"SD","zip":24101,"country":"US","region":1,"email":"ITHOMQJNYX@dell.com","phone":"4608499546","creditcardtype":1,"creditcard":"1979279217775911","creditcardexpiration":"2012/03","username":"user1","password":"password","age":55,"income":100000,"gender":"M"}

此时customer表虽然转储为json格式文件,但是并不能直接导入到elasticsearch,否则会报错如下

$ curl -H "Content-Type: application/json" -XPOST "172.16.101.55:9200/bank/_bulk?pretty&refresh" --data-binary "@customer.json"
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_NUMBER]"
}
],
"type" : "illegal_argument_exception",
"reason" : "Malformed action/metadata line [], expected START_OBJECT or END_OBJECT but found [VALUE_NUMBER]"
},
"status" : 400
}

根据文档https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html说明,我们的json数据里并未指定每行数据唯一的文档id值

3.为json格式的表数据添加id字段

因为之前我们看到该customer表共有2000行,所以我们需要生成对应的20000个id值,我们借助python实现,新建build_id.py文件,并写入如下内容,看清楚是20001,因为包头不包尾原则,1-20000实际打印出来是1-19999,所以我们写1-20001

for i in range(1,20001):
print('{"index":{"_id":"%s"}}' %i ) 

为该文件添加可执行权限,然后执行即可

$ python build_id.py > build_id.txt

$ head -3 build_id.txt
{"index":{"_id":""}}
{"index":{"_id":""}}
{"index":{"_id":""}}

利用linux “paste"命令,将id文件和表文件合并

$ paste -d'\n' build_id.txt customer.json > customer_new.json

$ head - customer_new.json
{"index":{"_id":""}}
{"customerid":,"firstname":"VKUUXF","lastname":"ITHOMQJNYX","address1":"4608499546 Dell Way","address2":null,"city":"QSDPAGD","state":"SD","zip":,"country":"US","region":,"email":"ITHOMQJNYX@dell.com","phone":"","creditcardtype":,"creditcard":"","creditcardexpiration":"2012/03","username":"user1","password":"password","age":,"income":,"gender":"M"}
{"index":{"_id":""}}
{"customerid":,"firstname":"HQNMZH","lastname":"UNUKXHJVXB","address1":"5119315633 Dell Way","address2":null,"city":"YNCERXJ","state":"AZ","zip":,"country":"US","region":,"email":"UNUKXHJVXB@dell.com","phone":"","creditcardtype":,"creditcard":"","creditcardexpiration":"2012/11","username":"user2","password":"password","age":,"income":,"gender":"M"}

4.此时处理过的json格式的表文件就可以正常导入到elasticsearch中了,测试

$ curl -H "Content-Type: application/json" -XPOST "172.16.101.55:9200/customer/_bulk?pretty&refresh" --data-binary "@customer_new.json"
$ curl http://172.16.101.55:9200/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open customer DvLoM7NjSYyjTwD5BSkK3A 10mb 10mb

将PostgreSQL数据库的表导入到elasticsearch中的更多相关文章

  1. 使用Logstash把MySQL数据导入到Elasticsearch中

    总结:这种适合把已有的MySQL数据导入到Elasticsearch中 有一个csv文件,把里面的数据通过Navicat Premium 软件导入到数据表中,共有998条数据 文件下载地址:https ...

  2. oracle 表导入到powerDesigner 中

    最近不忙,之前一直是用powerDesigner看表结构,还没自己导入过,今天试试 oracle 表导入到powerDesigner 中步骤: 1.File--->reverse Enginne ...

  3. Logstash:把MySQL数据导入到Elasticsearch中

    Logstash:把MySQL数据导入到Elasticsearch中 前提条件 需要安装好Elasticsearch及Kibana. MySQL安装 根据不同的操作系统我们分别对MySQL进行安装.我 ...

  4. sqoop将关系型的数据库得数据导入到hbase中

    1.sqoop将关系数据库导入到hbase的参数说明

  5. Logstash:解析 JSON 文件并导入到 Elasticsearch 中

    转载自:https://elasticstack.blog.csdn.net/article/details/114383426 在今天的文章中,我们将详述如何使用 Logstash 来解析 JSON ...

  6. sqoop将关系型数据库的表导入hive中

    1.sqoop 将关系型数据库的数据导入hive的参数说明:

  7. pg_dumpall - 抽出一个 PostgreSQL 数据库集群到脚本文件中

    SYNOPSIS pg_dumpall [ option...] DESCRIPTION 描述 pg_dumpall 是一个用于写出("转储")一个数据库集群里的所有 Postgr ...

  8. 将Mongodb的表导入到Hive中

    1.官方文档:https://docs.mongodb.com/ecosystem/tools/hadoop/ 2.Hive介绍: Hive特点: 1.hive是一个数据仓库,和oracle,mysq ...

  9. 如何将数据库中的表导入到PowerDesigner中

    1.        打开PowerDesigner12,在菜单中按照如下方式进行操作file->Reverse Engineer->DataBase 点击后,弹出 New Physical ...

随机推荐

  1. Confluence 6.15 修改历史(Change-History)宏

    修改历史(Change-History)宏显示了页面一个的更新历史:版本号,作者,日期和备注.这些内容将会在同一栏中进行显示. 屏幕截图:Confluence 中的修改历史(Change-Histor ...

  2. CodeForces 1197 D Yet Another Subarray Problem

    题面 不得不说CF还是很擅长出这种让人第一眼看摸不着头脑然后再想想就发现是个SB题的题的hhh(请自行断句). 设sum[]为前缀和数组,那么区间 [l,r]的价值为 sum[r] - sum[l-1 ...

  3. BZOJ 4814 Luogu P3699 [CQOI2017]小Q的草稿 (计算几何、扫描线、set)

    题目链接 (BZOJ) http://lydsy.com/JudgeOnline/problem.php?id=4814 (Luogu) https://www.luogu.org/problem/P ...

  4. TypeScript----接口和泛型

    接口 TypeScript的核心原则之一是对值所具有的结构进行类型检查.它有时被称做“鸭式辨型法”或“结构性子类型化”.在TypeScript里,接口的作用就是为这些类型命名和为你的代码或第三方代码定 ...

  5. Xpath中text(),string(),data()的区别

    摘要: 在XPath中,经常使用text()和string(),而我一般都是想到哪个用哪个,究竟他们之间有什么不同,没有在意过. 本质区别 text()是一个node test,而string()是一 ...

  6. $\LaTeX$数学公式大全1

    $1\ Geek\ and\ Hebrew\ letters$$\alpha$ \alpha$\beta$ \beta$\chi$ \chi$\delta$ \delta$\epsilon$ \eps ...

  7. Partial Dependence Plot

    Partial Dependence就是用来解释某个特征和目标值y的关系的,一般是通过画出Partial Dependence Plot(PDP)来体现. PDP是依赖于模型本身的,所以我们需要先训练 ...

  8. TCP->IP输出 之 ip_queue_xmit、ip_build_and_send_pkt、ip_send_unicast_reply

    概述 ip_queue_xmit是ip层提供给tcp层发送回调,大多数tcp发送都会使用这个回调,tcp层使用tcp_transmit_skb封装了tcp头之后,调用该函数,该函数提供了路由查找校验. ...

  9. 尚硅谷Docker---1、docker杂记

    尚硅谷Docker---1.docker杂记 一.总结 一句话总结: ~ php用的homestead就相当于docker,javaee一般都是用docker,php也可以用docker ~ dock ...

  10. linux安装软件时/usr/lib/python2.7/site-packages/urlgrabber/grabber.py文件异常

    linux安装软件时,经常出现以下异常信息 Traceback (most recent call last): File , in <module> main() File , in m ...