Elasticsearch由浅入深（五）_version乐观锁、external version乐观锁、partial update、groovy脚本实现partial update

基于_version进行乐观锁并发控制

先构造一条数据出来

PUT /test_index/test_type/

{

  "test_field": "test test"

}

模拟两个客户端，都获取到了同一条数据

GET test_index/test_type/

{

  "_index": "test_index",

  "_type": "test_type",

  "_id": "",

  "_version": ,

  "found": true,

  "_source": {

    "test_field": "test test"

  }

}

其中一个客户端，先更新了一下这个数据

同时带上数据的版本号，确保说，es中的数据的版本号，跟客户端中的数据的版本号是相同的，才能修改

PUT /test_index/test_type/?version=

{

  "test_field": "test client 1"

}

{

  "_index": "test_index",

  "_type": "test_type",

  "_id": "",

  "_version": ,

  "result": "updated",

  "_shards": {

    "total": ,

    "successful": ,

    "failed":

  },

  "created": false

}

另外一个客户端，尝试基于version=1的数据去进行修改，同样带上version版本号，进行乐观锁的并发控制

PUT /test_index/test_type/?version=

{

  "test_field": "test client 2"

}

{

  "error": {

    "root_cause": [

      {

        "type": "version_conflict_engine_exception",

        "reason": "[test_type][7]: version conflict, current version [2] is different than the one provided [1]",

        "index_uuid": "6m0G7yx7R1KECWWGnfH1sw",

        "shard": "",

        "index": "test_index"

      }

    ],

    "type": "version_conflict_engine_exception",

    "reason": "[test_type][7]: version conflict, current version [2] is different than the one provided [1]",

    "index_uuid": "6m0G7yx7R1KECWWGnfH1sw",

    "shard": "",

    "index": "test_index"

  },

  "status":

}

在乐观锁成功阻止并发问题之后，尝试正确的完成更新

GET /test_index/test_type/

{

  "_index": "test_index",

  "_type": "test_type",

  "_id": "",

  "_version": ,

  "found": true,

  "_source": {

    "test_field": "test client 1"

  }

}

基于最新的数据和版本号，去进行修改，修改后，带上最新的版本号，可能这个步骤会需要反复执行好几次，才能成功，特别是在多线程并发更新同一条数据很频繁的情况下

PUT /test_index/test_type/?version=

{

  "test_field": "test client 2"

}

{

  "_index": "test_index",

  "_type": "test_type",

  "_id": "",

  "_version": ,

  "result": "updated",

  "_shards": {

    "total": ,

    "successful": ,

    "failed":

  },

  "created": false

}

基于external version进行乐观锁并发控制

什么是external version

es提供了一个feature，就是说，你可以不用它提供的内部_version版本号来进行并发控制，可以基于你自己维护的一个版本号来进行并发控制。举个列子，加入你的数据在mysql里也有一份，然后你的应用系统本身就维护了一个版本号，无论是什么自己生成的，程序控制的。这个时候，你进行乐观锁并发控制的时候，可能并不是想要用es内部的_version来进行控制，而是用你自己维护的那个version来进行控制。

?version=

?version=&version_type=external

version_type=external，唯一的区别在于，_version，只有当你提供的version与es中的_version一模一样的时候，才可以进行修改，只要不一样，就报错；当version_type=external的时候，只有当你提供的version比es中的_version大的时候，才能完成修改

es，_version=，?version=，才能更新成功

es，_version=，?version>&version_type=external，才能成功，比如说?version=&version_type=external

先构造一条数据

PUT /test_index/test_type/

{

  "test_field": "test"

}

{

  "_index": "test_index",

  "_type": "test_type",

  "_id": "",

  "_version": ,

  "result": "created",

  "_shards": {

    "total": ,

    "successful": ,

    "failed":

  },

  "created": true

}

模拟两个客户端同时查询到这条数据

GET /test_index/test_type/

{

  "_index": "test_index",

  "_type": "test_type",

  "_id": "",

  "_version": ,

  "found": true,

  "_source": {

    "test_field": "test"

  }

}

第一个客户端先进行修改，此时客户端程序是在自己的数据库中获取到了这条数据的最新版本号，比如说是2

PUT /test_index/test_type/?version=&version_type=external

{

  "test_field": "test client 1"

}

{

  "_index": "test_index",

  "_type": "test_type",

  "_id": "",

  "_version": ,

  "result": "updated",

  "_shards": {

    "total": ,

    "successful": ,

    "failed":

  },

  "created": false

}

模拟第二个客户端，同时拿到了自己数据库中维护的那个版本号，也是2，同时基于version=2发起了修改

PUT /test_index/test_type/?version=&version_type=external

{

  "test_field": "test client 2"

}

{

  "error": {

    "root_cause": [

      {

        "type": "version_conflict_engine_exception",

        "reason": "[test_type][8]: version conflict, current version [2] is higher or equal to the one provided [2]",

        "index_uuid": "6m0G7yx7R1KECWWGnfH1sw",

        "shard": "",

        "index": "test_index"

      }

    ],

    "type": "version_conflict_engine_exception",

    "reason": "[test_type][8]: version conflict, current version [2] is higher or equal to the one provided [2]",

    "index_uuid": "6m0G7yx7R1KECWWGnfH1sw",

    "shard": "",

    "index": "test_index"

  },

  "status":

}

在并发控制成功后，重新基于最新的版本号发起更新

GET /test_index/test_type/

{

  "_index": "test_index",

  "_type": "test_type",

  "_id": "",

  "_version": ,

  "found": true,

  "_source": {

    "test_field": "test client 1"

  }

}

PUT /test_index/test_type/?version=&version_type=external

{

  "test_field": "test client 2"

}

{

  "_index": "test_index",

  "_type": "test_type",

  "_id": "",

  "_version": ,

  "result": "updated",

  "_shards": {

    "total": ,

    "successful": ,

    "failed":

  },

  "created": false

}

partial update

简介

一般对应到应用程序中，每次的执行流程基本是这样的：

应用程序先发起一个get请求，获取到document，展示到前台界面，供用户查看和修改
用户在前台界面修改数据，发送到后台
后台代码，会将用户修改的数据在内存中进行执行，然后封装好修改后的全量数据
然后发送PUT请求，到es中，进行全量替换
es将老的document标记为deleted，然后重新创建一个新的document

partial update是仅仅进行部分更新，无需全量替换

语法：

post /index/type/id/_update

{

   "doc": {

      "要修改的少数几个field即可，不需要全量的数据"

   }

}

看起来，好像就比较方便了，每次就传递少数几个发生修改的field即可，不需要将全量的document数据发送过去

partial update原理及优点

全量更新的原理

其实es内部对partial update的实际执行,跟传统的全虽替换方式,是几乎一样的

内部先获取document
将传过来的field更新到document的json中
将老的document标记为deleted
将修改后的新的document创建出来

partial update相较于全星替换的优点：

所有的查询、修改和写国操作,都发生在es中的一个shard内部,避免了所有的网络数据传输的开销(减少2次网络请求) ,大大提升了性能
减少了查询和修改中的时间间隔,可以有效减少并发冲突的情况

示例

PUT /test_index/test_type/

{

  "test_field1": "test1",

  "test_field2": "test2"

}

partial update

POST /test_index/test_type//_update

{

  "doc": {

    "test_field2": "updated test2"

  }

}

基于groovy脚本实现partial update

ES其实是有个内置的脚本支持的，可以基于groovy脚本实现各种各样的复杂操作,基于groovy脚本，如何执行partial update

初始化脚本：

PUT /test_index/test_type/

{

  "num": ,

  "tags": []

}

内置脚本

POST /test_index/test_type//_update

{

  "script": "ctx._source.num+=1"

}

外部脚本
假设我们想在tags中新增内容
我们可以写成脚本，在 \config\scripts 目录下新建 test-add-yags.groovy 脚本：

ctx._source.tags+=new_tag

POST /test_index/test_type//_update

{

  "script": {

    "lang": "groovy",

    "file": "test-add-yags",

    "params": {

      "new_tag":"tag1"

    }

  }

}

GET /test_index/test_type/

{

  "_index": "test_index",

  "_type": "test_type",

  "_id": "",

  "_version": ,

  "found": true,

  "_source": {

    "num": ,

    "tags": [

      "tag1"

    ]

  }

}

用脚本删除文档
如果num等于count则删除
我们可以写成脚本，在 \config\scripts 目录下新建 test-delete-document.groovy 脚本：

ctx.op = ctx._source.num == count ? 'delete' : 'none'

POST /test_index/test_type//_update

{

  "script": {

    "lang": "groovy",

    "file": "test-delete-document",

    "params": {

      "count":

    }

  }

}

{

  "_index": "test_index",

  "_type": "test_type",

  "_id": "",

  "_version": ,

  "result": "deleted",

  "_shards": {

    "total": ,

    "successful": ,

    "failed":

  }

}

upsert操作
顾名思义：upsert=update+insert
不存在document进行更新

POST /test_index/test_type//_update

{

  "doc": {

    "num":

  }

}

{

  "error": {

    "root_cause": [

      {

        "type": "document_missing_exception",

        "reason": "[test_type][11]: document missing",

        "index_uuid": "d7GOSxVnTNKYuI8x7cZfkA",

        "shard": "",

        "index": "test_index"

      }

    ],

    "type": "document_missing_exception",

    "reason": "[test_type][11]: document missing",

    "index_uuid": "d7GOSxVnTNKYuI8x7cZfkA",

    "shard": "",

    "index": "test_index"

  },

  "status":

}

upsert操作

POST /test_index/test_type//_update

{

  "script": "ctx._source.num+=1",

  "upsert": {

    "num": ,

    "tags": []

  }

}

图解partial update乐观锁并发控制原理

partial update内部会自动执行我们之前所说的乐观锁的并发控制策

避免自动partial update fail掉的解决方案，使用 retry_on_conflict

retry策略

再次获取document数据和最新版本号
基于最新版本号再次去更新,如果成功那么就ok了、如果失败,重复1和2两个步骤,最多重复几次呢?可以通过retry那个参数的值指定,比如5次

示例：

post /index/type/id/_update?retry_on_conflict=&version=