ElasticSearch stores each piece of data in a document.

That's what I need.

Using the bulk API.

Transform the raw data file from data.json to be new_data.json .

And then do this to import data to ElasticSearch :

curl -s -XPOST 'localhost:9200/_bulk' --data-binary @new_data.json

For example, I now have a raw JSON data file as following:

The file   data.json

{"key1":"valueA_row_1","key2":"valueB_row_1","key3":"valueC_row_1"}
{"key1":"valueA_row_2","key2":"valueB_row_2","key3":"valueC_row_2"}
{"key1":"valueA_row_3","key2":"valueB_row_3","key3":"valueC_row_3"}

Then I need to import these data to elasticsearch. So I have to manipulate this file by naming its index and type.

A new file will be created  new_data.json

{"index":{"_index":"myindex1","_type":"mytype1"}}
{"key1":"valueA_row_1","key2":"valueB_row_1","key3":"valueC_row_1"}
{"index":{"_index":"myindex1","_type":"mytype1"}}
{"key1":"valueA_row_2","key2":"valueB_row_2","key3":"valueC_row_2"}
{"index":{"_index":"myindex1","_type":"mytype1"}}
{"key1":"valueA_row_3","key2":"valueB_row_3","key3":"valueC_row_3"}

There are information above each of the data line in the file new_data.json

And if the JSON data file contains data those are not in the same _index or _type, just change the {"index":{"_********   line

Here is an example of a valid JSON file for elasticsearch.

full_data.json

{"index":{"_index":"myindex1","_type":"mytype1"}}
{"key1":"value1","key2":"value2","key3":"value3"}
{"index":{"_index":"myindex1","_type":"mytype1"}}
{"key1":"abcde","key2":"efg","key3":"klm"}
{"index":{"_index":"myindex2","_type":"mytype2"}}
{"newkey":"newvalue"}

Notice that : There are 2 indexes in the file above. They are   myindex1  and  myindex2

And the data schema in index myindex2 is different from that in index myindex1 .

That's why it's so important to have so many lines of {"index":{"_********    in the new data file.

-----

Now I am coding a python scripe to manipulate with some raw JSON data files.

Let's assume each line of the JSON data file are in the same schema. And I will do this to generate the schema out.

example_raw_data.json

import sys

def get_schema():
"""
"""
return None if __name__ == "__main__":
print(get_schema)

-------------Updated on 27th Nov. 2015 ----------

I solved this by inventing a new wheel

You can check this out:

https://github.com/xros/json-py-es

-------------Updated on 28th Nov. 2015  at 01:33 A.M. ----------

pip install jsonpyes

I wrote this module and it works!

Happy hacking!

elasticsearch data importing的更多相关文章

  1. 【Big Data - ELK】ELK(ElasticSearch, Logstash, Kibana)搭建实时日志分析平台

    摘要: 前段时间研究的Log4j+Kafka中,有人建议把Kafka收集到的日志存放于ES(ElasticSearch,一款基于Apache Lucene的开源分布式搜索引擎)中便于查找和分析,在研究 ...

  2. [Big Data - ELK] ELK(ElasticSearch, Logstash, Kibana)搭建实时日志分析平台

    ELK平台介绍 在搜索ELK资料的时候,发现这篇文章比较好,于是摘抄一小段: 以下内容来自: http://baidu.blog.51cto.com/71938/1676798 日志主要包括系统日志. ...

  3. ElasticsearchException: java.io.IOException: failed to read [id:0, file:/data/elasticsearch/nodes/0/_state/global-0.st]

    from : https://www.cnblogs.com/hixiaowei/p/11213143.html 1.以前装过elasticsearch,重新安装elastic search ,报错 ...

  4. Ubuntu 14.04中Elasticsearch集群配置

    Ubuntu 14.04中Elasticsearch集群配置 前言:本文可用于elasticsearch集群搭建参考.细分为elasticsearch.yml配置和系统配置 达到的目的:各台机器配置成 ...

  5. ElasticSearch 5学习(2)——Kibana+X-Pack介绍使用(全)

    Kibana是一个为 ElasticSearch 提供的数据分析的 Web 接口.可使用它对日志进行高效的搜索.可视化.分析等各种操作.Kibana目前最新的版本5.0.2,回顾一下Kibana 3和 ...

  6. docker run elasticsearch

    docker run -d --name=esNode1 -p 9200:9200 -p 9300:9300 elasticsearch:2.3 -Des.network.publish_host=& ...

  7. ElasticSearch详解与优化设计

    简介 概念 安装部署 ES安装 数据索引 索引优化 内存优化 1简介 ElasticSearch(简称ES)是一个分布式.Restful的搜索及分析服务器,设计用于分布式计算:能够达到实时搜索,稳定, ...

  8. 分布式搜索引擎Elasticsearch的简单使用

    官方网址:https://www.elastic.co/products/elasticsearch/ 一.特性 1.支持中文分词 2.支持多种数据源的全文检索引擎 3.分布式 4.基于lucene的 ...

  9. 【转】ELK(ElasticSearch, Logstash, Kibana)搭建实时日志分析平台

    [转自]https://my.oschina.net/itblog/blog/547250 摘要: 前段时间研究的Log4j+Kafka中,有人建议把Kafka收集到的日志存放于ES(ElasticS ...

随机推荐

  1. C# WinForm开发系列 - WebBrowser

    原文:C# WinForm开发系列 - WebBrowser 介绍Vs 2005中带的WebBrowser控件使用以及一些疑难问题的解决方法, 如如何正确显示中文, 屏蔽右键菜单, 设置代理等; 收集 ...

  2. 步步详解近期大火的density_peak超赞聚类

    近期忙着在公司捣腾基于SOA的应急框架,还是前两周才在微博上看见了density_peak,被圈内好些人转载. 由于这个算法的名字起的实在惹眼,都没好意思怎么把这个算法名字翻译成中文,当然更惹眼的是, ...

  3. Swift入门教程:基本运算符

    基本运算符 Swift所支持的基本运算符 赋值运算符:= 复合赋值运算符:+=.-= 算数运算符:+.-.*./ 求余运算符:% 自增.自减运算符:++.-- 比较运算符:==.!=.>.< ...

  4. selenium之多线程启动grid分布式测试框架封装(四)

    九.工具类,启动所有远程服务的浏览器 在utils包中创建java类:LaunchAllRemoteBrowsers package com.lingfeng.utils; import java.n ...

  5. css mainDiv和popbox居中

    <style>     .beCenter {         width:460px;         height:212px;         background:#ccc;    ...

  6. ASP.NET如何显示农历时间

    ASP.NET如何显示农历时间 CS部分代码如下: 代码如下: public string ChineseTimeNow = "";  public string ForignTi ...

  7. Android从无知到有知——NO.4

    因为我们做的是手机安全卫士,因此,我们需要一个地图定位功能,些相关的项目,也有一些教程.到百度官方下载了相关的jar包和API,但自己建项目的时候却不是那么顺利,bug不断,弄得心烦意乱,最后最终臣服 ...

  8. [译]Java 设计模式之桥接

    (文章翻译自Java Design Pattern: Bridge) 简单来说,桥梁设计模式是一个两层的抽象. 桥接模式就是从一个抽象中实现中解耦以便两个都可以独立的改变.桥接使用封装聚合而且使用继承 ...

  9. SSMS2008插件开发(4)--自定义菜单

    原文:SSMS2008插件开发(4)--自定义菜单 打开上次的项目MySSMSAddin中的Connect类,发现该类继于了两个接口:IDTExtensibility2和IDTCommandTarge ...

  10. Strongly connected(hdu4635(强连通分量))

    /* http://acm.hdu.edu.cn/showproblem.php?pid=4635 Strongly connected Time Limit: 2000/1000 MS (Java/ ...