1.查看PostgreSQL表结构和数据信息

edbstore=# \d customers
Table "edbstore.customers"
Column | Type | Modifiers
----------------------+-----------------------+----------------------------------------------------------------
customerid | integer | not null default nextval('customers_customerid_seq'::regclass)
firstname | character varying(50) | not null
lastname | character varying(50) | not null
address1 | character varying(50) | not null
address2 | character varying(50) |
city | character varying(50) | not null
state | character varying(50) |
zip | integer |
country | character varying(50) | not null
region | smallint | not null
email | character varying(50) |
phone | character varying(50) |
creditcardtype | integer | not null
creditcard | character varying(50) | not null
creditcardexpiration | character varying(50) | not null
username | character varying(50) | not null
password | character varying(50) | not null
age | smallint |
income | integer |
gender | character varying(1) |
Indexes:
"customers_pkey" PRIMARY KEY, btree (customerid)
"ix_cust_username" UNIQUE, btree (username)
Referenced by:
TABLE "cust_hist" CONSTRAINT "fk_cust_hist_customerid" FOREIGN KEY (customerid) REFERENCES customers(customerid) ON DELETE CASCADE
TABLE "orders" CONSTRAINT "fk_customerid" FOREIGN KEY (customerid) REFERENCES customers(customerid) ON DELETE SET NULL edbstore=# select count(1) from customers;
count
-------
20000
(1 row)

2.利用PostgreSQL的row_to_json函数将表结构导出并保存为json格式

edbstore=# \t
Tuples only is on.
edbstore=# \o customer.json
edbstore=# select row_to_json(r) from customers as r;
edbstore=# \q [postgres@sht-sgmhadoopcm-01 dba]$ ls -lh customer.json
-rw-r--r-- 1 postgres appuser 7.7M Dec 7 22:37 customer.json $ head -1 customer.json
{"customerid":1,"firstname":"VKUUXF","lastname":"ITHOMQJNYX","address1":"4608499546 Dell Way","address2":null,"city":"QSDPAGD","state":"SD","zip":24101,"country":"US","region":1,"email":"ITHOMQJNYX@dell.com","phone":"4608499546","creditcardtype":1,"creditcard":"1979279217775911","creditcardexpiration":"2012/03","username":"user1","password":"password","age":55,"income":100000,"gender":"M"}

此时customer表虽然转储为json格式文件,但是并不能直接导入到elasticsearch,否则会报错如下

$ curl -H "Content-Type: application/json" -XPOST "172.16.101.55:9200/bank/_bulk?pretty&refresh" --data-binary "@customer.json"
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_NUMBER]"
}
],
"type" : "illegal_argument_exception",
"reason" : "Malformed action/metadata line [], expected START_OBJECT or END_OBJECT but found [VALUE_NUMBER]"
},
"status" : 400
}

根据文档https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html说明,我们的json数据里并未指定每行数据唯一的文档id值

3.为json格式的表数据添加id字段

因为之前我们看到该customer表共有2000行,所以我们需要生成对应的20000个id值,我们借助python实现,新建build_id.py文件,并写入如下内容,看清楚是20001,因为包头不包尾原则,1-20000实际打印出来是1-19999,所以我们写1-20001

for i in range(1,20001):
print('{"index":{"_id":"%s"}}' %i ) 

为该文件添加可执行权限,然后执行即可

$ python build_id.py > build_id.txt

$ head -3 build_id.txt
{"index":{"_id":""}}
{"index":{"_id":""}}
{"index":{"_id":""}}

利用linux “paste"命令,将id文件和表文件合并

$ paste -d'\n' build_id.txt customer.json > customer_new.json

$ head - customer_new.json
{"index":{"_id":""}}
{"customerid":,"firstname":"VKUUXF","lastname":"ITHOMQJNYX","address1":"4608499546 Dell Way","address2":null,"city":"QSDPAGD","state":"SD","zip":,"country":"US","region":,"email":"ITHOMQJNYX@dell.com","phone":"","creditcardtype":,"creditcard":"","creditcardexpiration":"2012/03","username":"user1","password":"password","age":,"income":,"gender":"M"}
{"index":{"_id":""}}
{"customerid":,"firstname":"HQNMZH","lastname":"UNUKXHJVXB","address1":"5119315633 Dell Way","address2":null,"city":"YNCERXJ","state":"AZ","zip":,"country":"US","region":,"email":"UNUKXHJVXB@dell.com","phone":"","creditcardtype":,"creditcard":"","creditcardexpiration":"2012/11","username":"user2","password":"password","age":,"income":,"gender":"M"}

4.此时处理过的json格式的表文件就可以正常导入到elasticsearch中了,测试

$ curl -H "Content-Type: application/json" -XPOST "172.16.101.55:9200/customer/_bulk?pretty&refresh" --data-binary "@customer_new.json"
$ curl http://172.16.101.55:9200/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open customer DvLoM7NjSYyjTwD5BSkK3A 10mb 10mb

将PostgreSQL数据库的表导入到elasticsearch中的更多相关文章

  1. 使用Logstash把MySQL数据导入到Elasticsearch中

    总结:这种适合把已有的MySQL数据导入到Elasticsearch中 有一个csv文件,把里面的数据通过Navicat Premium 软件导入到数据表中,共有998条数据 文件下载地址:https ...

  2. oracle 表导入到powerDesigner 中

    最近不忙,之前一直是用powerDesigner看表结构,还没自己导入过,今天试试 oracle 表导入到powerDesigner 中步骤: 1.File--->reverse Enginne ...

  3. Logstash:把MySQL数据导入到Elasticsearch中

    Logstash:把MySQL数据导入到Elasticsearch中 前提条件 需要安装好Elasticsearch及Kibana. MySQL安装 根据不同的操作系统我们分别对MySQL进行安装.我 ...

  4. sqoop将关系型的数据库得数据导入到hbase中

    1.sqoop将关系数据库导入到hbase的参数说明

  5. Logstash:解析 JSON 文件并导入到 Elasticsearch 中

    转载自:https://elasticstack.blog.csdn.net/article/details/114383426 在今天的文章中,我们将详述如何使用 Logstash 来解析 JSON ...

  6. sqoop将关系型数据库的表导入hive中

    1.sqoop 将关系型数据库的数据导入hive的参数说明:

  7. pg_dumpall - 抽出一个 PostgreSQL 数据库集群到脚本文件中

    SYNOPSIS pg_dumpall [ option...] DESCRIPTION 描述 pg_dumpall 是一个用于写出("转储")一个数据库集群里的所有 Postgr ...

  8. 将Mongodb的表导入到Hive中

    1.官方文档:https://docs.mongodb.com/ecosystem/tools/hadoop/ 2.Hive介绍: Hive特点: 1.hive是一个数据仓库,和oracle,mysq ...

  9. 如何将数据库中的表导入到PowerDesigner中

    1.        打开PowerDesigner12,在菜单中按照如下方式进行操作file->Reverse Engineer->DataBase 点击后,弹出 New Physical ...

随机推荐

  1. SQL Server代码的一种学习方法

    使用SQL Server Management Studio的操作过程中,界面上方都可以生成sql脚本代码. 如新建数据库时: CREATE DATABASE [db_New] ON PRIMARY ...

  2. java怎样实现重载一个方法

    重载(重新载选方法): java允许在一个类中,存在多个方法拥有相同的名字,但在名字相同的同时,必须有不同的参数,这就是重载,编译器会根据实际情况挑选出正确的方法,如果编译器找不到匹配的参数或者找出多 ...

  3. 回文数二(acm训练)

    问题 1161: [回文数(二)] 时间限制: 1Sec 内存限制: 128MB 提交: 133 解决: 51 题目描述 若一个数(首位不为零)从左向右读与从右向左读都一样,我们就将其称之为回文数.  ...

  4. css实现9宫格

    html <div class="nine"> <ul> <li>1</li> <li>2</li> < ...

  5. Codeforces 437D The Child and Zoo(并查集)

    Codeforces 437D The Child and Zoo 题目大意: 有一张连通图,每个点有对应的值.定义从p点走向q点的其中一条路径的花费为途径点的最小值.定义f(p,q)为从点p走向点q ...

  6. Java线程之FutureTask

    简述 FutureTask是Future接口的实现类,并提供了可取消的异步处理的功能,它包含了启动和取消(start and cancel)任务的方法,同时也包含了可以返回FutureTask状态(c ...

  7. 关于vue.js element ui 表单验证 this.$refs[formName].validate()的问题

        方法使用前需了解: 来自”和“小编的小提示: 首先打印一下this.$refs[formName],检查是否拿到了正确的需要验证的form. 其次在拿到了正确的form后,检查该form上添加 ...

  8. truncate at 255 characters with xlsx files(OLEDB方式读取Excel丢失数据、字符串截断的原因和解决方法)

    The TypeGuessRows setting is supported by ACE. Note the version numbers in the key may change depend ...

  9. Python中的OS对路径的操作以及应用

    目录处理 OS目录处理目录-->路径,文件夹 文件:html 1. 新建和删除一个目录import os #引入os目录from xx import xxos.mkdir("D:\\P ...

  10. win10备忘

    你要允许来自未知发布者 http://www.xitonghe.com/jiaocheng/Windows10-7809.html输入法 切换繁体 ctrl+shift+F win10 输入法 htt ...