Basic Concepts

There are a few concepts that are core to Elasticsearch. Understanding these concepts from the outset will tremendously help ease the learning process.

有一些概念是Elasticsearch的核心。从一开始就理解这些概念将极大地帮助简化学习过程。

Near Realtime (NRT)

Elasticsearch is a near real time search platform. What this means is there is a slight latency (normally one second) from the time you index a document until the time it becomes searchable.

Elasticsearch是一个近乎实时的搜索平台。这意味着从索引文档到可搜索文档的时间有一点延迟(通常为一秒)。

Cluster

A cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes. A cluster is identified by a unique name which by default is "elasticsearch". This name is important because a node can only be part of a cluster if the node is set up to join the cluster by its name.

集群是一个或多个节点(服务器)的集合,它们共同保存您的整个数据,并提供跨所有节点的联合索引和搜索功能。群集由唯一名称标识,默认情况下为“elasticsearch”。此名称很重要,因为如果节点设置为按名称加入群集,则该节点只能是群集的一部分。
 
Make sure that you don’t reuse the same cluster names in different environments, otherwise you might end up with nodes joining the wrong cluster. For instance you could use logging-devlogging-stage, and logging-prod for the development, staging, and production clusters.
确保不要在不同的环境中重用相同的群集名称,否则最终会导致节点加入错误的群集。例如,您可以将logging-dev,logging-stage和logging-prod用于开发,登台和生产集群。
 
Note that it is valid and perfectly fine to have a cluster with only a single node in it. Furthermore, you may also have multiple independent clusters each with its own unique cluster name.
请注意,如果群集中只有一个节点,那么它是完全正常的。此外,您还可以拥有多个独立的集群,每个集群都有自己唯一的集群名称。

Node

A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities. Just like a cluster, a node is identified by a name which by default is a random Universally Unique IDentifier (UUID) that is assigned to the node at startup. You can define any node name you want if you do not want the default. This name is important for administration purposes where you want to identify which servers in your network correspond to which nodes in your Elasticsearch cluster.

节点是作为群集一部分的单个服务器,存储数据并参与群集的索引和搜索功能。就像集群一样,节点由名称标识,默认情况下,该名称是在启动时分配给节点的随机通用唯一标识符(UUID)。如果不需要默认值,可以定义所需的任何节点名称。此名称对于管理目的非常重要,您可以在其中识别网络中哪些服务器与Elasticsearch集群中的哪些节点相对应。
 
A node can be configured to join a specific cluster by the cluster name. By default, each node is set up to join a cluster named elasticsearch which means that if you start up a number of nodes on your network and—assuming they can discover each other—they will all automatically form and join a single cluster named elasticsearch.
可以将节点配置为按群集名称加入特定群集。默认情况下,每个节点都设置为加入名为elasticsearch的集群,这意味着如果您在网络上启动了许多节点并且假设它们可以相互发现 - 它们将自动形成并加入名为elasticsearch的单个集群。
 
In a single cluster, you can have as many nodes as you want. Furthermore, if there are no other Elasticsearch nodes currently running on your network, starting a single node will by default form a new single-node cluster named elasticsearch.
在单个群集中,您可以拥有任意数量的节点。此外,如果您的网络上当前没有其他Elasticsearch节点正在运行,则默认情况下,启动单个节点将形成名为elasticsearch的新单节点集群。

Index

An index is a collection of documents that have somewhat similar characteristics. For example, you can have an index for customer data, another index for a product catalog, and yet another index for order data. An index is identified by a name (that must be all lowercase) and this name is used to refer to the index when performing indexing, search, update, and delete operations against the documents in it.

In a single cluster, you can define as many indexes as you want.

索引是具有某些类似特征的文档集合。例如,您可以拥有客户数据的索引,产品目录的另一个索引以及订单数据的另一个索引。索引由名称标识(必须全部小写),此名称用于在对其中的文档执行索引,搜索,更新和删除操作时引用索引。
在单个群集中,您可以根据需要定义任意数量的索引。

Type

Deprecated in 6.0.0.  See Removal of mapping types

A type used to be a logical category/partition of your index to allow you to store different types of documents in the same index, eg one type for users, another type for blog posts. It is no longer possible to create multiple types in an index, and the whole concept of types will be removed in a later version. See Removal of mapping types for more.

一种类型,曾经是索引的逻辑类别/分区,允许您在同一索引中存储不同类型的文档,例如,一种类型用于用户,另一种类型用于博客帖子。不再可能在索引中创建多个类型,并且将在更高版本中删除类型的整个概念。请参阅删除映射类型以获取更多信息。

Document

A document is a basic unit of information that can be indexed. For example, you can have a document for a single customer, another document for a single product, and yet another for a single order. This document is expressed in JSON (JavaScript Object Notation) which is a ubiquitous internet data interchange format.

文档是可以编制索引的基本信息单元。例如,您可以为单个客户提供文档,为单个产品提供另一个文档,为单个订单提供另一个文档。该文档以JSON(JavaScript Object Notation)表示,JSON是一种普遍存在的互联网数据交换格式。
 
Within an index/type, you can store as many documents as you want. Note that although a document physically resides in an index, a document actually must be indexed/assigned to a type inside an index.
在索引/类型中,您可以根据需要存储任意数量的文档。请注意,尽管文档实际上驻留在索引中,但实际上必须将文档编入索引/分配给索引中的类型。

Shards & Replicas

An index can potentially store a large amount of data that can exceed the hardware limits of a single node. For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone.

索引可能存储大量可能超过单个节点的硬件限制的数据。例如,占用1TB磁盘空间的十亿个文档的单个索引可能不适合单个节点的磁盘,或者可能太慢而无法单独从单个节点提供搜索请求。
 
To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster.
为了解决这个问题,Elasticsearch提供了将索引细分为多个称为分片的功能。创建索引时,只需定义所需的分片数即可。每个分片本身都是一个功能齐全且独立的“索引”,可以托管在集群中的任何节点上。
 
Sharding is important for two primary reasons:
分片很重要,主要有两个原因:
      1、It allows you to horizontally split/scale your content volume
       它允许您水平拆分/缩放内容量

   2、It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput
      它允许您跨分片(可能在多个节点上)分布和并行化操作,从而提高性能/吞吐量
 
The mechanics of how a shard is distributed and also how its documents are aggregated back into search requests are completely managed by Elasticsearch and is transparent to you as the user.
分片的分布方式以及如何将其文档聚合回搜索请求的机制完全由Elasticsearch管理,对用户而言是透明的。
 
In a network/cloud environment where failures can be expected anytime, it is very useful and highly recommended to have a failover mechanism in case a shard/node somehow goes offline or disappears for whatever reason. To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short.
在可以随时发生故障的网络/云环境中,非常有用,强烈建议使用故障转移机制,以防分片/节点以某种方式脱机或因任何原因消失。为此,Elasticsearch允许您将索引的分片的一个或多个副本制作成所谓的副本分片或简称副本。
 
Replication is important for two primary reasons:
复制很重要,主要有两个原因:
    1、It provides high availability in case a shard/node fails. For this reason, it is important to note that a replica shard is never allocated on the same node as the original/primary shard that it was copied from.
    它在碎片/节点发生故障时提供高可用性。因此,请务必注意,副本分片永远不会在与从中复制的原始/主分片相同的节点上分配。
    2、It allows you to scale out your search volume/throughput since searches can be executed on all replicas in parallel.
  它允许您扩展搜索量/吞吐量,因为可以在所有副本上并行执行搜索。
 
To summarize, each index can be split into multiple shards. An index can also be replicated zero (meaning no replicas) or more times. Once replicated, each index will have primary shards (the original shards that were replicated from) and replica shards (the copies of the primary shards).
总而言之,每个索引可以拆分为多个分片。索引也可以复制为零(表示没有副本)或更多次。复制后,每个索引都将具有主分片(从中复制的原始分片)和副本分片(主分片的副本)。
 
The number of shards and replicas can be defined per index at the time the index is created. After the index is created, you may also change the number of replicas dynamically anytime. You can change the number of shards for an existing index using the _shrink and _split APIs, however this is not a trivial task and pre-planning for the correct number of shards is the optimal approach.
可以在创建索引时为每个索引定义分片和副本的数量。创建索引后,您还可以随时动态更改副本数。您可以使用_shrink和_split API更改现有索引的分片数,但这不是一项简单的任务,预先计划正确数量的分片是最佳方法。
 
By default, each index in Elasticsearch is allocated 5 primary shards and 1 replica which means that if you have at least two nodes in your cluster, your index will have 5 primary shards and another 5 replica shards (1 complete replica) for a total of 10 shards per index.
默认情况下,Elasticsearch中的每个索引都分配了5个主分片和1个副本,这意味着如果群集中至少有两个节点,则索引将包含5个主分片和另外5个副本分片(1个完整副本),总计为每个索引10个分片。
 

Each Elasticsearch shard is a Lucene index. There is a maximum number of documents you can have in a single Lucene index. As of LUCENE-5843, the limit is 2,147,483,519 (= Integer.MAX_VALUE - 128) documents. You can monitor shard sizes using the _cat/shards API.

每个Elasticsearch分片都是Lucene索引。单个Lucene索引中可以包含最大数量的文档。自LUCENE-5843起,限制为2,147,483,519(= Integer.MAX_VALUE - 128)个文件。您可以使用_cat / shards API监视分片大小。
 
With that out of the way, let’s get started with the fun part…
有了这个,让我们开始有趣的部分......

(二)Basic Concepts 基本概念的更多相关文章

  1. Math concepts / 数学概念

    链接网址:Math concepts / 数学概念 – https://www.codelast.com/math-concepts-%e6%95%b0%e5%ad%a6%e6%a6%82%e5%bf ...

  2. CMUSphinx Learn - Basic concepts of speech

    Basic concepts of speech Speech is a complex phenomenon. People rarely understand how is it produced ...

  3. Direct3D11学习:(二)基本绘图概念和基本类型

    转载请注明出处:http://www.cnblogs.com/Ray1024   一.概述 在正式开始学习D3D11之前,我们必需首先学习必要的基础知识. 在这篇文章中,我们将介绍一下Direct3D ...

  4. Basic Concepts of Block Media Recovery

    Basic Concepts of Block Media Recovery Whenever block corruption has been automatically detected, yo ...

  5. 图解Disruptor框架(二):核心概念

    图解Disruptor框架(二):核心概念 概述 上一个章节简单的介绍了了下Disruptor,这节就是要好好的理清楚Disruptor中的核心的概念.并且会给出个HelloWorld的小例子. 在正 ...

  6. (二)MongoDB基本概念

    (二)MongoDB基本概念 mongodb 2018年03月07日 08时43分53秒 mognoDB是一个面向文档的数据库,而不是关系型数据库,是不是用关系型数据库主要是为了获得更好的扩展性,还会 ...

  7. Elasticsearch入门教程(二):Elasticsearch核心概念

    原文:Elasticsearch入门教程(二):Elasticsearch核心概念 版权声明:本文为博主原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明. 本文链接:ht ...

  8. 【原】Coursera—Andrew Ng机器学习—课程笔记 Lecture 1_Introduction and Basic Concepts 介绍和基本概念

    目录 1.1 欢迎1.2 机器学习是什么 1.2.1 机器学习定义 1.2.2 机器学习算法 - Supervised learning 监督学习 - Unsupervised learning  无 ...

  9. Storm 学习之路(二)—— Storm核心概念详解

    一.Storm核心概念 1.1 Topologies(拓扑) 一个完整的Storm流处理程序被称为Storm topology(拓扑).它是一个是由Spouts 和Bolts通过Stream连接起来的 ...

随机推荐

  1. OCR识别

    最近作者项目中用到了身份证识别跟营业执照的OCR识别,就研究了一下百度云跟腾讯云的OCR产品接口. 1.腾讯云OCR 收费:身份证OCR和营业执照OCR接口,每个接口每个月各有1000次的免费调用 接 ...

  2. DSAPI多功能组件编程应用-使用外部字体(包括资源文件)

    在软件开发过程中,尤其是比较个性化的程序,有时会需要使用非安装字体文件,比如发布的时候附带了一个专用字体,或者该字体文件直接被放入项目资源,当不希望把这个字体安装到用户的操作系统但又想使用它时,本示例 ...

  3. 第一册:lesson 103.

    原文:The French text. How was the exam, Richard? Not too bad. I think I passed in English and Mathemat ...

  4. C# IQueryable和IEnumerable的区别

    在使用EF查询数据的时候,我们常用的查询数据方式有linq to sql,linq to object, 查询返回的结果有两种类型:IQueryable.IEnumerable,两者内部的处理机制是完 ...

  5. GNOME图形界面的基本操作

    成功登录进入CentOS系统之后,我们首先看到的桌面就是GNOME图形界面,下面来看一下相关的基本操作. 个性化设置 1,设置屏幕分辨率 进入菜单 2,更换桌面背景 进入下面菜单. 选择一张背景图片, ...

  6. ASP.NET Core基础1:应用启动流程

    先看下ASP.NET Core的启动代码,如下图: 通过以上代码,我们可以初步得出以下结论: 所有的ASP.NET Core程序本质上也是一个控制台程序,使用Program的Main方法作为程序的入口 ...

  7. java开发环境配置——Maven

    前篇讲了jdk的安装,这篇讲一下包管理工具Maven,Maven主要是用来统一管理项目引用的jar包,还有用来打包的. Maven官网下载地址:http://maven.apache.org/down ...

  8. WEB 实时推送技术的总结

    前言 随着 Web 的发展,用户对于 Web 的实时推送要求也越来越高 ,比如,工业运行监控.Web 在线通讯.即时报价系统.在线游戏等,都需要将后台发生的变化主动地.实时地传送到浏览器端,而不需要用 ...

  9. 【Dojo 1.x】笔记2 使用服务器环境及使用模块

    又开坑了.上次静态html页面完成本地module的引用,算是成功了,但是并不知道是怎么运作的,没关系慢慢来. 我用的环境是VSCode,这次因为官方说要在服务器环境下运行,所以就用上了VSCode的 ...

  10. cesium 之自定义气泡窗口 infoWindow 后续优化篇(附源码下载)

    前言 cesium 官网的api文档介绍地址cesium官网api,里面详细的介绍 cesium 各个类的介绍,还有就是在线例子:cesium 官网在线例子,这个也是学习 cesium 的好素材. 该 ...