ElasticSearch 2 (5) - Document APIs
ElasticSearch 2.1.1 (5) - Document APIs
This section describes the following CRUD APIs:
Single document APIs
Index API
Query:
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
Result:
{
"_shards" : {
"total" : 10,
"failed" : 0,
"successful" : 10
},
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_version" : 1,
"created" : true
}
total -
Indicates to how many shard copies (primary and replica shards) the index operation should be executed on.
successful -
Indicates the number of shard copies the index operation succeeded on.
failures -
An array that contains replication related errors in the case an index operation failed on a replica shard.
Replica shards may not all be started when an indexing operation successfully returns (by default, a quorum is required). In that case, total will be equal to the total shards based on the index replica settings and successful will be equal to the number of shards started (primary plus replicas). As there were no failures, the failed will be 0.
Automatic Index Creation
Automatic index creation disable
action.auto_create_index => false
Automatic mapping(type) creation disable
index.mapper.dynamic => false
Index creation black/white list
action.auto_create_index => +aaa*,-bbb*,+ccc*,-*
- +: allowed
- -: disallowed
Versioning
- Optimistic concurrency control
Transactional read-then-update. It is recommended to set preference to _primary
curl -XPUT 'localhost:9200/twitter/tweet/1?version=2' -d '{
"message" : "elasticsearch now has versioning support, double cool!"
}'
NOTE: versioning is completely real time, and is not affected by the near real time aspects of search operations. If no version is provided, then the operation is executed without any version checks.
Version types
internal
only index the document if the given version is identical to the version of the stored document.
external or external_gt
only index the document if the given version is strictly higher than the version of the stored document or if there is no existing document. The given version will be used as the new version and will be stored with the new document. The supplied version must be a non-negative long number.
external_gte
only index the document if the given version is equal or higher than the version of the stored document. If there is no existing document the operation will succeed as well. The given version will be used as the new version and will be stored with the new document. The supplied version must be a non-negative long number.
force
correcting errors
Operation Type
op_type
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1?op_type=create' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
create
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1/_create' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
Automatic ID Generation
POST used instead of PUT (op_type will automatically be set to create)
$ curl -XPOST 'http://localhost:9200/twitter/tweet/' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
automatic ID generation (6a8ca01c-7896-48e9-81cc-9f70661fcb32)
{
"_index" : "twitter",
"_type" : "tweet",
"_id" : "6a8ca01c-7896-48e9-81cc-9f70661fcb32",
"_version" : 1,
"created" : true
}
Routing
default
Hash of the document’s id value
explicit control
The value fed into the hash function used by the router can be directly specified on a per-operation basis using the routing parameter
$ curl -XPOST 'http://localhost:9200/twitter/tweet?routing=kimchy' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
When setting up explicit mapping, the _routing field can be optionally used to direct the index operation to extract the routing value from the document itself. This does come at the (very minimal) cost of an additional document parsing pass. If the _routing mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.
Parents & Children
Child document index by parent(Automatically)
$ curl -XPUT localhost:9200/blogs/blog_tag/1122?parent=1111 -d '{
"tag" : "something"
}'
Timestamp (Deprecated in 2.0.0-beta2.)
Use Date
TTL (time to live) (Deprecated in 2.0.0-beta2.)
Future
Distributed
The index operation is directed to the primary shard based on its route (see the Routing section above) and performed on the actual node containing this shard. After the primary shard completes the operation, if needed, the update is distributed to applicable replicas.
Primary shard => Replicas
Write Consistency
Quorum
(>replicas/2+1) of active shards are available
action.write_consistency
- one
- quorum
- all
behavior
- node-by-node
- per-operation
sync replication
The index operation only returns after all active shards within the replication group have indexed the document
Refresh
- To refresh the shard (not the whole index)
- True - poor performance
- GetAPI - realtime (doesn't require refresh)
Noop Updates
detect_noop (version)
- true: compare document content
- false: ignore document content
no hard and fast rule
Timeout
default - 1 min
explicit
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1?timeout=5m' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
Get API
Get:
curl -XGET 'http://localhost:9200/twitter/tweet/1'
Result:
{
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_version" : 1,
"found": true,
"_source" : {
"user" : "kimchy",
"postDate" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
}
Check Exists:
curl -XHEAD -i 'http://localhost:9200/twitter/tweet/1'
Result:
HTTP/1.1 404 Not Found
es.resource.type: index_expression
es.resource.id: twitter
es.index: twitter
Content-Type: text/plain; charset=UTF-8
Content-Length: 0
Realtime
To disable
action.get.realtime => false
fields (Good Practice)
- BECAUSE: At least for a period of time, basically, until the next flush
- THEREFORE: Assume fields will be loaded from source when using realtime GET
Optional Type
_type: optional
_all: fetch first cross types
Source filtering
Default: open
To disable
use fields
_source false
curl -XGET 'http://localhost:9200/twitter/tweet/1?_source=false'
Large document
- _source_include
- _source_exclude
Parameter
- list
- wildcards
Example
curl -XGET 'http://localhost:9200/twitter/tweet/1?_source_include=*.id&_source_exclude=entities'
Short notation(_source_include)
curl -XGET 'http://localhost:9200/twitter/tweet/1?_source=*.id,retweeted'
Fields
Example
curl -XGET 'http://localhost:9200/twitter/tweet/1?fields=title,content'
Backward compatibility
If the requested fields are not stored, they will be fetched from the _source
Be replaced by source filtering
Field values
Document (always array)
Meta (never array)
_routing
_parent
Leaf/Object
- Leaf success
- Object fail
Generated fields
No refresh occurred between indexing and refresh
GET will access the transaction log to fetch the document
Some fields are generated ONLY when indexing
default
error
ignore
ignore_errors_on_generated_fields=true
Getting the _source directly
Direct
curl -XGET 'http://localhost:9200/twitter/tweet/1/_source'
Source filtering
curl -XGET 'http://localhost:9200/twitter/tweet/1/_source?_source_include=*.id&_source_exclude=entities'
Existence
curl -XHEAD -i 'http://localhost:9200/twitter/tweet/1/_source'
Routing
Get:
curl -XGET 'http://localhost:9200/twitter/tweet/1?routing=kimchy'
Error:
{"error":{"root_cause":[
{"type":"index_not_found_exception",
"reason":"no such index",
"resource.type":"index_expression",
"resource.id":"twitter",
"index":"twitter"}],
"type":"index_not_found_exception",
"reason":"no such index",
"resource.type":"index_expression",
"resource.id":"twitter",
"index":"twitter"},"status":404}
Preference
Controls a preference of which shard replicas to execute the get request on
Default
Random
Preference
_primary
The operation will go and be executed only on the primary shards.
_local
The operation will prefer to be executed on a local allocated shard if possible.
Custom(string) value
A custom value will be used to guarantee that the same shards will be used for the same custom value. This can help with "jumping values" when hitting different shards in different refresh states. A sample value can be something like the web session id, or the user name.
Refresh
It may cause a heavy load on the system (and slows down indexing)
Distributed
The get operation gets hashed into a specific shard id.
It then gets redirected to one of the replicas within that shard id and returns the result.
The replicas are the primary shard and its replicas within that shard id group. This means that the more replicas we will have, the better GET scaling we will have.
Versioning support
Internally, Elasticsearch has marked the old document as deleted and added an entirely new document. The old version of the document doesn’t disappear immediately, although you won’t be able to access it. Elasticsearch cleans up deleted documents in the background as you continue to index more data.
Delete API
Delete:
$ curl -XDELETE 'http://localhost:9200/twitter/tweet/1'
Result:
{
"_shards" : {
"total" : 10,
"failed" : 0,
"successful" : 10
},
"found" : true,
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_version" : 2
}
Versioning
Each document indexed is versioned. When deleting a document, the version can be specified to make sure the relevant document we are trying to delete is actually being deleted and it has not changed in the meantime. Every write operation executed on a document, deletes included, causes its version to be incremented.
Routing
$ curl -XDELETE 'http://localhost:9200/twitter/tweet/1?routing=kimchy'
Note, issuing a delete without the correct routing, will cause the document to not be deleted.
Many times, the routing value is not known when deleting a document. For those cases, when specifying the _routing mapping as required, and no routing value is specified, the delete will be broadcast automatically to all shards.
Parent
Note that deleting a parent document does not automatically delete its children.
delete all (parent_type#parent_id)
delete-by-query plugin
Automatic index creation
Automatically creates an index if it has not been created before
Automatically creates a dynamic type mapping for the specific type if it has not been created before
Distributed
Write Consistency
Refresh
Timeout
$ curl -XDELETE 'http://localhost:9200/twitter/tweet/1?timeout=5m'
Update API
curl -XPUT localhost:9200/test/type1/1 -d '{
"counter" : 1,
"tags" : ["red"]
}'
Scripted updates
Increment the counter
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"script" : {
"inline": "ctx._source.counter += count",
"params" : {
"count" : 4
}
}
}'
Add a tag (no duplication check)
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"script" : {
"inline": "ctx._source.tags += tag",
"params" : {
"tag" : "blue"
}
}
}'
ctx map
- _source
- _index
- _type
- _id
- _version
- _routing
- _parent
- _timestamp
- _ttl
Add field
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"script" : "ctx._source.name_of_new_field = \"value_of_new_field\""
}'
Remove field
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"script" : "ctx._source.remove(\"name_of_field\")"
}'
Condition
delete or noop
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"script" : {
"inline": "ctx._source.tags.contains(tag) ? ctx.op = \"delete\" : ctx.op = \"none\"",
"params" : {
"tag" : "blue"
}
}
}'
Updates with a partial document
Merge
Simple recursive merge, inner merging of objects, replacing core "keys/values" and arrays
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"doc" : {
"name" : "new_name"
}
}'
script > doc
Detecting noop updates
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"doc" : {
"name" : "new_name"
},
"detect_noop": false
}'
Upserts
upsert
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"script" : {
"inline": "ctx._source.counter += count",
"params" : {
"count" : 4
}
},
"upsert" : {
"counter" : 1
}
}'
scripted_upsert
curl -XPOST 'localhost:9200/sessions/session/dh3sgudg8gsrgl/_update' -d '{
"scripted_upsert":true,
"script" : {
"id": "my_web_session_summariser",
"params" : {
"pageViewEvent" : {
"url":"foo.com/bar",
"response":404,
"time":"2014-01-01 12:32"
}
}
},
"upsert" : {}
}'
doc_as_upsert
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"doc" : {
"name" : "new_name"
},
"doc_as_upsert" : true
}'
Parameters
retry_on_conflict
In between the get and indexing phases of the update, it is possible that another process might have already updated the same document. By default, the update will fail with a version conflict exception. The retry_on_conflict parameter controls how many times to retry the update before finally throwing an exception.
routing
Routing is used to route the update request to the right shard and sets the routing for the upsert request if the document being updated doesn’t exist. Can’t be used to update the routing of an existing document.
parent
Parent is used to route the update request to the right shard and sets the parent for the upsert request if the document being updated doesn’t exist. Can’t be used to update the parent of an existing document.
timeout
Timeout waiting for a shard to become available.
consistency
The write consistency of the index/delete operation.
refresh
Refresh the relevant primary and replica shards (not the whole index) immediately after the operation occurs, so that the updated document appears in search results immediately.
fields
Return the relevant fields from the updated document. Specify _source to return the full updated source.
version & version_type
The update API uses the Elasticsearch’s versioning support internally to make sure the document doesn’t change during the update. You can use the version parameter to specify that the document should only be updated if it’s version matches the one specified. By setting version type to force you can force the new version of the document after update (use with care! with force there is no guarantee the document didn’t change).Version types external & external_gte are not supported.
Multi-document APIs
mget
curl 'localhost:9200/_mget' -d '{
"docs" : [
{
"_index" : "test",
"_type" : "type",
"_id" : "1"
},
{
"_index" : "test",
"_type" : "type",
"_id" : "2"
}
]
}'
against index
curl 'localhost:9200/test/_mget' -d '{
"docs" : [
{
"_type" : "type",
"_id" : "1"
},
{
"_type" : "type",
"_id" : "2"
}
]
}'
against type
curl 'localhost:9200/test/type/_mget' -d '{
"docs" : [
{
"_id" : "1"
},
{
"_id" : "2"
}
]
}'
ids
curl 'localhost:9200/test/type/_mget' -d '{
"ids" : ["1", "2"]
}'
Multi Get API
Optional Type
same document twice
curl 'localhost:9200/test/_mget' -d '{
"ids" : ["1", "1"]
}'
explicit
GET /test/_mget/
{
"docs" : [
{
"_type":"typeA",
"_id" : "1"
},
{
"_type":"typeB",
"_id" : "1"
}
]
}
Source filtering
_source, _source_include & _source_exclude
curl 'localhost:9200/_mget' -d '{
"docs" : [
{
"_index" : "test",
"_type" : "type",
"_id" : "1",
"_source" : false
},
{
"_index" : "test",
"_type" : "type",
"_id" : "2",
"_source" : ["field3", "field4"]
},
{
"_index" : "test",
"_type" : "type",
"_id" : "3",
"_source" : {
"include": ["user"],
"exclude": ["user.location"]
}
}
]
}'
Fields
example
curl 'localhost:9200/_mget' -d '{
"docs" : [
{
"_index" : "test",
"_type" : "type",
"_id" : "1",
"fields" : ["field1", "field2"]
},
{
"_index" : "test",
"_type" : "type",
"_id" : "2",
"fields" : ["field3", "field4"]
}
]
}'
specify default
curl 'localhost:9200/test/type/_mget?fields=field1,field2' -d '{
"docs" : [
{
"_id" : "1"
},
{
"_id" : "2",
"fields" : ["field3", "field4"]
}
]
}'
result:
"_id" : "1" => returns field1 and field2 "_id" : "2" => returns field3 and field4
Generated fields
Fields are generated only when indexing.
Routing
example
curl 'localhost:9200/_mget?routing=key1' -d '{
"docs" : [
{
"_index" : "test",
"_type" : "type",
"_id" : "1",
"_routing" : "key2"
},
{
"_index" : "test",
"_type" : "type",
"_id" : "2"
}
]
}'
result:
test/type/1 => key2
test/type/2 => key1
Security
URL-based access control
Bulk API
The bulk API makes it possible to perform many index/delete operations in a single API call
Increase the indexing speed
Client support for bulk requests
- Perl
- Python
Endpoint /_bulk
action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n
Actions
index
create
delete
update
Curl
text (--data-binary)
document (-d)
The latter doesn't preserve newlines
$ cat requests
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
$ curl -s -XPOST localhost:9200/_bulk --data-binary "@requests"; echo
{"took":7,"items":[{"create":{"_index":"test","_type":"type1","_id":"1","_version":1}}]}
Example
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1"} }
{ "doc" : {"field2" : "value2"} }
Above
merge
endpoints(use default unless explicit)
/_bulk
/{index}/_bulk
{index}/{type}/_bulk
only action_meta_data is parsed on the receiving node side (fast)
response - large JSON structure
number of actions - should be optimized per workload
HTTP API - no chunks (slow down)
Versioning
Routing
Parent
Timestamp
Deprecated in 2.0.0-beta2.
TTL
Deprecated in 2.0.0-beta2.
Write Consistency
Refresh
Update
Action inline _retry_on_conflict
Supports
- doc (partial document)
- upsert
- doc_as_upsert
- script
- params (for script)
- lang (for script)
- fields
Example
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"} }
{ "update" : { "_id" : "0", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
{ "script" : { "inline": "ctx._source.counter += param1", "lang" : "js", "params" : {"param1" : 1}}, "upsert" : {"counter" : 1}}
{ "update" : {"_id" : "2", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"}, "doc_as_upsert" : true }
{ "update" : {"_id" : "3", "_type" : "type1", "_index" : "index1", "fields" : ["_source"]} }
{ "doc" : {"field" : "value"} }
{ "update" : {"_id" : "4", "_type" : "type1", "_index" : "index1"} }
{ "doc" : {"field" : "value"}, "fields": ["_source"]}
Security
Term Vectors
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html
Multi termvectors API
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-multi-termvectors.html
Reference
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs.html
ElasticSearch 2 (5) - Document APIs的更多相关文章
- Document APIs
本节首先简要介绍Elasticsearch的数据复制模型,然后详细描述以下CRUD API: Single document APIs Index API Get API Delete API Upd ...
- elasticsearch6.7 05. Document APIs(2)Index API
Single document APIs Index API Get API Delete API Update API Multi-document APIs Multi Get API Bulk ...
- es第二篇:Document APIs
文档CRUD API分为单文档API和多文档API.这些API的索引名参数既可以是一个真正的索引的名称,也可以是某个索引的别名alias. 单文档API有:Index API.Get API.Dele ...
- elasticsearch6.7 05. Document APIs(6)UPDATE API
5. UPDATE API 更新操作可以使用脚本来更新.更新的时候会先从索引中获取文档数据(在每个分片中的集合),然后运行脚本(使用可选的脚本语言和参数),再果进行索引(还允许删除或忽略该操作).它使 ...
- elasticsearch6.7 05. Document APIs(5)Delete By Query API
4.Delete By Query API _delete_by_query API可以删除某个匹配条件的文档: POST twitter/_delete_by_query { "query ...
- elasticsearch6.7 05. Document APIs(10)Reindex API
9.REINDEX API Reindex要求为源索引中的所有文档启用_source. reindex 不会配置目标索引,不会复制源索引的设置.你需要在reindex之前先指定mapping,分片数量 ...
- elasticsearch6.7 05. Document APIs(7)Update By Query API
6.Update By Query API _update_by_query 接口可以在不改变 source 的情况下对 index 中的每个文档进行更新.这对于获取新属性或其他联机映射更改很有用.以 ...
- elasticsearch6.7 05. Document APIs(3)GET API
2.GET API get API 可以通过文档id从索引中获取json格式的文档,以下示例从twitter索引中获取type为_doc,id值为0为的JSON文档: GET twitter/_doc ...
- elasticsearch6.7 05. Document APIs(1)data replication model
data replication model 本节首先简要介绍Elasticsearch的data replication model,然后详细描述以下CRUD api: 1.读写文档(Reading ...
随机推荐
- Android7.0调用系统相机拍照、读取系统相册照片+CropImageView剪裁照片
Android手机拍照.剪裁,并非那么简单 简书地址:[我的简书–T9的第三个三角] 前言 项目中,基本都有用户自定义头像或自定义背景的功能,实现方法一般都是调用系统相机–拍照,或者系统相册–选择照片 ...
- Qt Creator无法debug,报错:The selected debugger may be inappropriate for the inferior. Examining symbols and setting breakpoints by file name and line number may fail. The inferior is in the Portable ...
看到这个报错我是绝望的 解决:下载windows sdk win10 sdk 只安装Debugging Tools for Windows 打开 工具-选项-Kits 安装sdk成功后我们可以看到 ...
- WorldWind源码剖析系列:绘制参数类DrawArgs
绘制参数类DrawArgs主要对绘制时需要的对象如:设备对象Microsoft.DirectX.Direct3D.Device.Microsoft.DirectX.Direct3D.Font字体对象. ...
- 六,ESP8266 TCP Client(基于Lua脚本语言)
今天不知道是不是让我姐挺失望.......很多时候都不知道自己努力的方向对不对,,以后能不能带给家人最美好的期盼...... Init.lua 没啥改变,,就改了一下加载Client.lua gpio ...
- jqgrid editrules参数说明
转载至:jqgrid的editrules参数 以下为内容留存记录. editrules editrules是用来设置一些可用于可编辑列的colModel的额外属性的.大多数的时候是用来在提交到服 ...
- Python操作Saltstack
1.代码 # -*- coding:utf-8 -*- import urllib.request import urllib.parse import json class saltAPI(): d ...
- 自定义View之实现流行的底部菜单栏中间突起:高仿“咸鱼APP”的底部菜单 - z
http://blog.csdn.net/xh870189248/article/details/75808341 http://blog.csdn.net/yangg194/article/deta ...
- 20155202张旭《网络对抗技术》 week1 PC平台逆向破解及Bof基础实践
20155202张旭<网络对抗技术> week1 PC平台逆向破解及Bof基础实践 1.实践目标: 实践对象:一个名为pwn1的linux可执行文件. 该程序正常执行流程是: main调用 ...
- 20155320《网络对抗》Exp4 恶意代码分析
20155320<网络对抗>Exp4 恶意代码分析 [系统运行监控] 使用schtasks指令监控系统运行 首先在C盘目录下建立一个netstatlog.bat文件(由于是系统盘,所以从别 ...
- WPF编程,通过Double Animation动态更改控件属性的一种方法。
原文:WPF编程,通过Double Animation动态更改控件属性的一种方法. 版权声明:我不生产代码,我只是代码的搬运工. https://blog.csdn.net/qq_43307934/a ...