elasticsearch data importing
ElasticSearch stores each piece of data in a document.
That's what I need.
Using the bulk API.
Transform the raw data file from data.json to be new_data.json .
And then do this to import data to ElasticSearch :
curl -s -XPOST 'localhost:9200/_bulk' --data-binary @new_data.json
For example, I now have a raw JSON data file as following:
The file data.json
{"key1":"valueA_row_1","key2":"valueB_row_1","key3":"valueC_row_1"}
{"key1":"valueA_row_2","key2":"valueB_row_2","key3":"valueC_row_2"}
{"key1":"valueA_row_3","key2":"valueB_row_3","key3":"valueC_row_3"}
Then I need to import these data to elasticsearch. So I have to manipulate this file by naming its index and type.
A new file will be created new_data.json
{"index":{"_index":"myindex1","_type":"mytype1"}}
{"key1":"valueA_row_1","key2":"valueB_row_1","key3":"valueC_row_1"}
{"index":{"_index":"myindex1","_type":"mytype1"}}
{"key1":"valueA_row_2","key2":"valueB_row_2","key3":"valueC_row_2"}
{"index":{"_index":"myindex1","_type":"mytype1"}}
{"key1":"valueA_row_3","key2":"valueB_row_3","key3":"valueC_row_3"}
There are information above each of the data line in the file new_data.json
And if the JSON data file contains data those are not in the same _index or _type, just change the {"index":{"_******** line
Here is an example of a valid JSON file for elasticsearch.
full_data.json
{"index":{"_index":"myindex1","_type":"mytype1"}}
{"key1":"value1","key2":"value2","key3":"value3"}
{"index":{"_index":"myindex1","_type":"mytype1"}}
{"key1":"abcde","key2":"efg","key3":"klm"}
{"index":{"_index":"myindex2","_type":"mytype2"}}
{"newkey":"newvalue"}
Notice that : There are 2 indexes in the file above. They are myindex1 and myindex2
And the data schema in index myindex2 is different from that in index myindex1 .
That's why it's so important to have so many lines of {"index":{"_******** in the new data file.
-----
Now I am coding a python scripe to manipulate with some raw JSON data files.
Let's assume each line of the JSON data file are in the same schema. And I will do this to generate the schema out.
example_raw_data.json
import sys def get_schema():
"""
"""
return None if __name__ == "__main__":
print(get_schema)
-------------Updated on 27th Nov. 2015 ----------
I solved this by inventing a new wheel
You can check this out:
https://github.com/xros/json-py-es
-------------Updated on 28th Nov. 2015 at 01:33 A.M. ----------
pip install jsonpyes
I wrote this module and it works!
Happy hacking!
elasticsearch data importing的更多相关文章
- 【Big Data - ELK】ELK(ElasticSearch, Logstash, Kibana)搭建实时日志分析平台
摘要: 前段时间研究的Log4j+Kafka中,有人建议把Kafka收集到的日志存放于ES(ElasticSearch,一款基于Apache Lucene的开源分布式搜索引擎)中便于查找和分析,在研究 ...
- [Big Data - ELK] ELK(ElasticSearch, Logstash, Kibana)搭建实时日志分析平台
ELK平台介绍 在搜索ELK资料的时候,发现这篇文章比较好,于是摘抄一小段: 以下内容来自: http://baidu.blog.51cto.com/71938/1676798 日志主要包括系统日志. ...
- ElasticsearchException: java.io.IOException: failed to read [id:0, file:/data/elasticsearch/nodes/0/_state/global-0.st]
from : https://www.cnblogs.com/hixiaowei/p/11213143.html 1.以前装过elasticsearch,重新安装elastic search ,报错 ...
- Ubuntu 14.04中Elasticsearch集群配置
Ubuntu 14.04中Elasticsearch集群配置 前言:本文可用于elasticsearch集群搭建参考.细分为elasticsearch.yml配置和系统配置 达到的目的:各台机器配置成 ...
- ElasticSearch 5学习(2)——Kibana+X-Pack介绍使用(全)
Kibana是一个为 ElasticSearch 提供的数据分析的 Web 接口.可使用它对日志进行高效的搜索.可视化.分析等各种操作.Kibana目前最新的版本5.0.2,回顾一下Kibana 3和 ...
- docker run elasticsearch
docker run -d --name=esNode1 -p 9200:9200 -p 9300:9300 elasticsearch:2.3 -Des.network.publish_host=& ...
- ElasticSearch详解与优化设计
简介 概念 安装部署 ES安装 数据索引 索引优化 内存优化 1简介 ElasticSearch(简称ES)是一个分布式.Restful的搜索及分析服务器,设计用于分布式计算:能够达到实时搜索,稳定, ...
- 分布式搜索引擎Elasticsearch的简单使用
官方网址:https://www.elastic.co/products/elasticsearch/ 一.特性 1.支持中文分词 2.支持多种数据源的全文检索引擎 3.分布式 4.基于lucene的 ...
- 【转】ELK(ElasticSearch, Logstash, Kibana)搭建实时日志分析平台
[转自]https://my.oschina.net/itblog/blog/547250 摘要: 前段时间研究的Log4j+Kafka中,有人建议把Kafka收集到的日志存放于ES(ElasticS ...
随机推荐
- SEO 优化,网站推广优化教程100条(SEO,网站关键字优化,怎么优化网站,如何优化网站关键字)
这篇文章不错. http://www.cnblogs.com/zangdalei/archive/2010/08/31/1814047.html 看了一半之后的,觉得不太靠谱,很多都不懂. 于是 找 ...
- 音乐TV2015校园招聘A第二大发行量(对中国科学院大学站)
标题叙述性说明:鉴于阵列A,尺寸n,数组元素1至n数位,但是,一些数字多次出现,有些数字不出现.请设计算法和程序.统计数据不会出现什么.什么号码多次出现. 您可以O(n)时间复杂度,O(1)求下完毕么 ...
- <转>如何利用socket进行HTTP访问
原文:<转>如何利用socket进行HTTP访问 如何利用socket进行HTTP访问 平常我们要访问某个URL一般都是通过浏览器进行:提交一个URL请求后,浏览器将请求发向目标服务器或者 ...
- EasyMonkeyDevice vs MonkeyDevice&HierarchyViewer API Mapping Matrix
1. 前言 本来这次文章的title是写成和前几篇类似的<EasyMonkeyDevice API实践全记录>,内容也打算把每个API的实践和建议给记录下来,但后来想了下觉得这样子并不是最 ...
- 对于GetBuffer() 与 ReleaseBuffer() 的一些分析
先 转载一段别人的文章 CString类的这几个函数, 一直在用, 但总感觉理解的不够透彻, 不时还实用错的现象. 今天抽时间和Nico一起分析了一下, 算是拨开了云雾: GetBuffer和Rele ...
- oracle 创建用户,授权用户,创建表,查询表
原文:oracle 创建用户,授权用户,创建表,查询表 oracle 创建用户,授权用户,创建表,查询表 假设oracle10g所有的都已经安装和配置好 第一步:win+R,进入运行,cmd; 第二步 ...
- jQuery实现表格行的动态增加与删除
删除之前删除2行后: 1<script> 8 $(document).ready(function(){ 9 //<tr/>居中 10 $("#tab tr" ...
- 在标记的HREF属性中javascript:alert(this.innerHTML)会怎么样?
原文:在标记的HREF属性中javascript:alert(this.innerHTML)会怎么样? <a href="javascript:alert(this.innerHTML ...
- JS中通过call方法实现继承
原文:JS中通过call方法实现继承 讲解都写在注释里面了,有不对的地方请拍砖,谢谢! <html xmlns="http://www.w3.org/1999/xhtml"& ...
- CSS3 实现六边形Div图片展示效果
原文:CSS3 实现六边形Div图片展示效果 效果图: 实现原理: 这个效果的主要css样式有: 1.>transform: rotate(120deg); 图片旋转 2.>overflo ...