elasticsearch data importing

ElasticSearch stores each piece of data in a document.

That's what I need.

Using the bulk API.

Transform the raw data file from data.json to be new_data.json .

And then do this to import data to ElasticSearch :

curl -s -XPOST 'localhost:9200/_bulk' --data-binary @new_data.json

For example, I now have a raw JSON data file as following:

The file data.json

{"key1":"valueA_row_1","key2":"valueB_row_1","key3":"valueC_row_1"}
{"key1":"valueA_row_2","key2":"valueB_row_2","key3":"valueC_row_2"}
{"key1":"valueA_row_3","key2":"valueB_row_3","key3":"valueC_row_3"}

Then I need to import these data to elasticsearch. So I have to manipulate this file by naming its index and type.

A new file will be created new_data.json

{"index":{"_index":"myindex1","_type":"mytype1"}}

{"key1":"valueA_row_1","key2":"valueB_row_1","key3":"valueC_row_1"}

{"index":{"_index":"myindex1","_type":"mytype1"}}

{"key1":"valueA_row_2","key2":"valueB_row_2","key3":"valueC_row_2"}

{"index":{"_index":"myindex1","_type":"mytype1"}}

{"key1":"valueA_row_3","key2":"valueB_row_3","key3":"valueC_row_3"}

There are information above each of the data line in the file new_data.json

And if the JSON data file contains data those are not in the same _index or _type, just change the {"index":{"_******** line

Here is an example of a valid JSON file for elasticsearch.

full_data.json

{"index":{"_index":"myindex1","_type":"mytype1"}}

{"key1":"value1","key2":"value2","key3":"value3"}

{"index":{"_index":"myindex1","_type":"mytype1"}}

{"key1":"abcde","key2":"efg","key3":"klm"}

{"index":{"_index":"myindex2","_type":"mytype2"}}

{"newkey":"newvalue"}

Notice that : There are 2 indexes in the file above. They are myindex1 and myindex2

And the data schema in index myindex2 is different from that in index myindex1 .

That's why it's so important to have so many lines of {"index":{"_******** in the new data file.

-----

Now I am coding a python scripe to manipulate with some raw JSON data files.

Let's assume each line of the JSON data file are in the same schema. And I will do this to generate the schema out.

example_raw_data.json

import sys

def get_schema():

    """

    """

    return None

if __name__ == "__main__":

    print(get_schema)

-------------Updated on 27th Nov. 2015 ----------

I solved this by inventing a new wheel

You can check this out:

https://github.com/xros/json-py-es

-------------Updated on 28th Nov. 2015 at 01:33 A.M. ----------

pip install jsonpyes

I wrote this module and it works!

Happy hacking!

elasticsearch data importing的更多相关文章

【Big Data - ELK】ELK(ElasticSearch, Logstash, Kibana)搭建实时日志分析平台
摘要: 前段时间研究的Log4j+Kafka中,有人建议把Kafka收集到的日志存放于ES(ElasticSearch,一款基于Apache Lucene的开源分布式搜索引擎)中便于查找和分析,在研究 ...
[Big Data - ELK] ELK(ElasticSearch, Logstash, Kibana)搭建实时日志分析平台
ELK平台介绍在搜索ELK资料的时候,发现这篇文章比较好,于是摘抄一小段: 以下内容来自: http://baidu.blog.51cto.com/71938/1676798 日志主要包括系统日志. ...
ElasticsearchException: java.io.IOException: failed to read [id:0, file:/data/elasticsearch/nodes/0/_state/global-0.st]
from : https://www.cnblogs.com/hixiaowei/p/11213143.html 1.以前装过elasticsearch,重新安装elastic search ,报错 ...
Ubuntu 14.04中Elasticsearch集群配置
Ubuntu 14.04中Elasticsearch集群配置前言:本文可用于elasticsearch集群搭建参考.细分为elasticsearch.yml配置和系统配置达到的目的:各台机器配置成 ...
ElasticSearch 5学习(2)——Kibana+X-Pack介绍使用（全）
Kibana是一个为 ElasticSearch 提供的数据分析的 Web 接口.可使用它对日志进行高效的搜索.可视化.分析等各种操作.Kibana目前最新的版本5.0.2,回顾一下Kibana 3和 ...
docker run elasticsearch
docker run -d --name=esNode1 -p 9200:9200 -p 9300:9300 elasticsearch:2.3 -Des.network.publish_host=& ...
ElasticSearch详解与优化设计
简介概念安装部署 ES安装数据索引索引优化内存优化 1简介 ElasticSearch(简称ES)是一个分布式.Restful的搜索及分析服务器,设计用于分布式计算:能够达到实时搜索,稳定, ...
分布式搜索引擎Elasticsearch的简单使用
官方网址:https://www.elastic.co/products/elasticsearch/ 一.特性 1.支持中文分词 2.支持多种数据源的全文检索引擎 3.分布式 4.基于lucene的 ...
【转】ELK(ElasticSearch, Logstash, Kibana)搭建实时日志分析平台
[转自]https://my.oschina.net/itblog/blog/547250 摘要: 前段时间研究的Log4j+Kafka中,有人建议把Kafka收集到的日志存放于ES(ElasticS ...

随机推荐

SQL2005性能分析一些细节功能你是否有用到?(三)
原文:SQL2005性能分析一些细节功能你是否有用到?(三) 继上篇: SQL2005性能分析一些细节功能你是否有用到?(二) 第一: SET STATISTICS PROFILE ON 当我们比较查 ...
SQL代理执行EXE可执行程序
原文:SQL代理执行EXE可执行程序 1.如果没有启用xp_cmdshell安全配置是不可以使用的-- 启用xp_cmdshellEXEC sp_configure 'xp_cmdshell', 1 ...
STL源代码分析——STL算法sort排序算法
前言因为在前文的<STL算法剖析>中,源代码剖析许多,不方便学习,也不方便以后复习.这里把这些算法进行归类,对他们单独的源代码剖析进行解说.本文介绍的STL算法中的sort排序算法,SG ...
sqlserver备份的几种方式
1.用sqlserver的维护计划在这里我就不给截图演示了,这个比较简单,无非就是通过sqlserver自己的维护计划拖拽出2个一个‘备份数据库’任务和一个‘清除维护’任务. 需要注意的点: 1)有 ...
安卓CTS官方文档之兼容性方案概览
兼容性方案概览安卓的兼容性方案让安卓手机生产商能够很容易就开发中可兼容的安卓设备(天地会珠海分舵注:可兼容什么呢?就是可以兼容标准google提供的安卓系统可以支持的功能,以防手机生产商把开源的安卓 ...
Appium根据xpath获取控件实例随笔
如文章<Appium基于安卓的各种FindElement的控件定位方法实践>所述,Appium拥有众多获取控件的方法.其中一种就是根据控件所在页面的XPATH来定位控件. 本文就是尝试通过 ...
Installshield关于.NET安装时需要重启动的处理办法，以及延伸出的重启后继续安装的安装包的一点想法
原文:Installshield关于.NET安装时需要重启动的处理办法,以及延伸出的重启后继续安装的安装包的一点想法很多朋友做安装包的时候,所打包的软件需要.NET Framework之类的环境,他 ...
再谈ORACLE CPROCD进程
罗列一下有关oprocd的知识点 oprocd是oracle在rac中引入用来fencing io的在unix系统下,假设我们没有採用oracle之外的第三方集群软件,才会存在oprocd进程在l ...
Progit Update Check Page
######### ######### #########
使用rem设计移动端自适应页面三（转载）
使用rem 然后根据媒体查询实现自适应.跟使用JS来自适应也是同个道理,不过是js更精确一点.使用媒体查询: html { font-size: 62.5% } @media only screen ...

elasticsearch data importing

elasticsearch data importing的更多相关文章

随机推荐

热门专题