Elasticsearch-->Get Started-->Basic concepts
https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-concepts.html
There are a few concepts that are core to Elasticsearch. Understanding these concepts from the outset will tremendously help ease the learning process.
Near Realtime (NRT)
Elasticsearch is a near-realtime search platform. What this means is there is a slight latency (normally one second) from the time you index a document until the time it becomes searchable.
Cluster
A cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes. A cluster is identified by a unique name which by default is "elasticsearch". This name is important because a node can only be part of a cluster if the node is set up to join the cluster by its name.
Make sure that you don’t reuse the same cluster names in different environments, otherwise you might end up with nodes joining the wrong cluster. For instance you could use logging-dev, logging-stage, and logging-prod for the development, staging, and production clusters.
Note that it is valid and perfectly fine to have a cluster with only a single node in it. Furthermore, you may also have multiple independent clusters each with its own unique cluster name.
Node
A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities. Just like a cluster, a node is identified by a name which by default is a random Universally Unique IDentifier (UUID) that is assigned to the node at startup. You can define any node name you want if you do not want the default. This name is important for administration purposes where you want to identify which servers in your network correspond to which nodes in your Elasticsearch cluster.
A node can be configured to join a specific cluster by the cluster name. By default, each node is set up to join a cluster named elasticsearch which means that if you start up a number of nodes on your network and—assuming they can discover each other—they will all automatically form and join a single cluster named elasticsearch.
In a single cluster, you can have as many nodes as you want. Furthermore, if there are no other Elasticsearch nodes currently running on your network, starting a single node will by default form a new single-node cluster named elasticsearch.
Index
An index is a collection of documents that have somewhat similar characteristics. For example, you can have an index for customer data, another index for a product catalog, and yet another index for order data. An index is identified by a name (that must be all lowercase) and this name is used to refer to the index when performing indexing, search, update, and delete operations against the documents in it.
In a single cluster, you can define as many indexes as you want.
Type(Deprecated in 6.0.0.)
A type used to be a logical category/partition of your index to allow you to store different types of documents in the same index, e.g. one type for users, another type for blog posts. It is no longer possible to create multiple types in an index, and the whole concept of types will be removed in a later version. See Removal of mapping types for more.
Document
A document is a basic unit of information that can be indexed. For example, you can have a document for a single customer, another document for a single product, and yet another for a single order. This document is expressed in JSON (JavaScript Object Notation) which is a ubiquitous internet data interchange format.
Within an index/type, you can store as many documents as you want. Note that although a document physically resides in an index, a document actually must be indexed/assigned to a type inside an index.
Shards & Replicas
An index can potentially store a large amount of data that can exceed the hardware limits of a single node. For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone.
To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster.
Sharding is important for two primary reasons:
- It allows you to horizontally split/scale your content volume
- It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput
The mechanics of how a shard is distributed and also how its documents are aggregated back into search requests are completely managed by Elasticsearch and is transparent to you as the user.
In a network/cloud environment where failures can be expected anytime, it is very useful and highly recommended to have a failover mechanism in case a shard/node somehow goes offline or disappears for whatever reason. To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short.
Replication is important for two primary reasons:
- It provides high availability in case a shard/node fails. For this reason, it is important to note that a replica shard is never allocated on the same node as the original/primary shard that it was copied from.
- It allows you to scale out your search volume/throughput since searches can be executed on all replicas in parallel.
To summarize, each index can be split into multiple shards. An index can also be replicated zero (meaning no replicas) or more times. Once replicated, each index will have primary shards (the original shards that were replicated from) and replica shards (the copies of the primary shards).
The number of shards and replicas can be defined per index at the time the index is created. After the index is created, you may also change the number of replicas dynamically anytime. You can change the number of shards for an existing index using the _shrink and _split APIs, however this is not a trivial task and pre-planning for the correct number of shards is the optimal approach.
By default, each index in Elasticsearch is allocated 5 primary shards and 1 replica which means that if you have at least two nodes in your cluster, your index will have 5 primary shards and another 5 replica shards (1 complete replica) for a total of 10 shards per index.
总结
A cluster is a collection of one or more nodes (servers)
A node is a single server that is part of your cluster
An index is a collection of documents that have somewhat similar characteristics. In a single cluster, you can define as many indexes as you want.
A document is a basic unit of information that can be indexed. Within an index, you can store as many documents as you want.
Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster.
each index can be split into multiple shards. An index can also be replicated zero (meaning no replicas) or more times.
Elasticsearch-->Get Started-->Basic concepts的更多相关文章
- (二)Basic Concepts 基本概念
Basic Concepts There are a few concepts that are core to Elasticsearch. Understanding these concepts ...
- Basic Concepts of Block Media Recovery
Basic Concepts of Block Media Recovery Whenever block corruption has been automatically detected, yo ...
- CMUSphinx Learn - Basic concepts of speech
Basic concepts of speech Speech is a complex phenomenon. People rarely understand how is it produced ...
- Nginx Tutorial #1: Basic Concepts(转)
add by zhj: 文章写的很好,适合初学者 原文:https://www.netguru.com/codestories/nginx-tutorial-basics-concepts Intro ...
- [Network]Introduction and Basic concepts
[该系列是检讨计算机网络知识.因为现在你想申请出国.因此,在写这篇博客系列的大多数英语.虽然英语,但大多数就是我自己的感受和理解,供大家学习和讨论起来] 1 Network Edge The devi ...
- Lesson 1 Basic Concepts: Part 1
www.how-to-build-websites.com/basic-concepts/part1.php An introduction to domain names, web servers, ...
- Eric Linux - 1 Basic concepts of linux
Computer basic Computer 5 parts CPU Input Output Memory External storage device. CPU RISC: Reduced I ...
- Basic concepts of docker/kubernete/kata-container
Kubereters An open-source system for automating deployment, scaling, and management of containerized ...
- MVC LINQ to SQL: Basic Concepts and Features
http://www.codeproject.com/Articles/215712/LINQ-to-SQL-Basic-Concepts-and-Features
- (C/C++) Interview in English - Basic concepts.
Question Key words Anwser A assignment operator abstract class It is a class that has one or more pu ...
随机推荐
- JQUERY实现的小巧简洁的无限级树形菜单
JQUERY实现的小巧简洁的无限级树形菜单,可用于后台或前台侧栏菜单!兼容性也比较好. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Tra ...
- 玩点不同之CSS的BEM规范
1.BEM引入背景 因为项目的业务逻辑发生重大变化,所以原来大半年的开发周期里做的事情基本上变成无用功.但是公司的项目上线时间依旧没有改变.剩下的时间只有区区的两个月,要做的功能是有社区+电商+核心业 ...
- 【BZOJ2803】[Poi2012]Prefixuffix 结论题
[BZOJ2803][Poi2012]Prefixuffix Description 对于两个串S1.S2,如果能够将S1的一个后缀移动到开头后变成S2,就称S1和S2循环相同.例如串ababba和串 ...
- go build说明
go build命令用于编译我们指定的源码文件或代码包以及它们的依赖包. 例如,如果我们在执行go build命令时不后跟任何代码包,那么命令将试图编译当前目录所对应的代码包.例如,我们想编译goc2 ...
- 【Python之路】第十五篇--Web框架
Web框架本质 众所周知,对于所有的Web应用,本质上其实就是一个socket服务端,用户的浏览器其实就是一个socket客户端. #!/usr/bin/env python #coding:utf- ...
- TFS二次开发-基线文件管理器(1)-设计
CMMI在做基线文件管理的时候,常常是需要记录一部分基线文件的版本.并且这个基线文件记录也需要进行版本控制.TFS在做这件事的时候一般来说会选用标签(Lable)来做一系列文件的版本记录. 但是我发现 ...
- <2013 12 17> 雅思写作、口语相关
这一个多月,参加了两次雅思考试,成绩分别为: Overall:6.5 L:7.0 R:7.5 W:6.0 S:5.5 Overall:7.0 L:7.0 ...
- $(document).ready() $(window).load 及js的window.onload
1.$(document).ready() 简写为$(function(){}) DOM结构绘制完成执行,而无需等到图片或其他媒体下载完毕. 2.$(window).load 在有时候确实我们有需 ...
- xlwt 模块 操作excel
1.xlwt 基本用法 import xlwt #1 新建文件 new_file = open('test.xls', 'w') new_file.close() #2 创建工作簿 wookbook ...
- Java中的异常和处理详解(转发:https://www.cnblogs.com/lulipro/p/7504267.html)
简介 程序运行时,发生的不被期望的事件,它阻止了程序按照程序员的预期正常执行,这就是异常.异常发生时,是任程序自生自灭,立刻退出终止,还是输出错误给用户?或者用C语言风格:用函数返回值作为执行状态?. ...