在MongoDB3.6引入的新feature中,change stream无疑是非常吸引人的。

  Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog.

  Change stream允许应用实时获取mongodb数据的变更,这是个呼声很高的一个的需求,可以用于ETL、跨平台数据同步、通知服务等。以前没有change stream的时候,也可以通过tail oplog来追踪修改,但这是复杂、危险的野路子。

  本文地址:https://www.cnblogs.com/xybaby/p/9464328.html

Change Stream特点

an-introduction-to-change-streams一文中,总结了change stream的几个特点

Targeted changes

  Changes can be filtered to provide relevant and targeted changes to listening applications.

Resumablility

  Resumability was top of mind when building change streams to ensure that applications can see every change in a collection. resume token

Total ordering

  MongoDB 3.6 has a global logical clock that enables the server to order all changes across a sharded cluster.

Durability

  Change streams only include majority-committed changes.

Security

  Change streams are secure – users are only able to create change streams on collections to which they have been granted read access.

Ease of use

  Change streams are familiar – the API syntax takes advantage of the established MongoDB drivers and query language, and are independent of the underlying oplog format.

Idempotence

  All changes are transformed into a format that’s safe to apply multiple times. Listening applications can use a resume token from any prior change stream event, not just the most recent one, because reapplying operations is safe and will reach the same consistent state.

  相比自动tail oplog,change stream 有以下优点:

  • 如果只有单个节点持久化,那么oplog对应的操作是可能被回滚的,而change stream有Durability特性
  • 在sharded cluster环境,change stream跨shards,可以通过mongos tail oplog,而不用去每一个replica set上分别tail
 

  Change stream对MongoDB的部署有一些需求:

  • 只对replica sets 或者sharded cluster(MongoDB3.6中shard必须是replica set)有用,这个不难理解,因为change stream也是利用了oplog。如果是sharded cluster,必须都过mongos连接。
  • 必须使用WiredTiger 引擎,使用replica set protocol version 1
 

Change Stream试用

  在文章免费试用MongoDB云数据库 (MongoDB Atlas)教程中,介绍了如何使用MongoDB Atlas提供的云数据库服务,免费提供的集群刚好是使用WiredTiger 引擎的Replica set,因此本文基于这个环境来测试。主要测试Change Stream所支持的所有事件(change event)、fullDocument特性、resume特性。

  change event包括:

  • insert
  • delete
  • replace
  • update
  • invalidate

  有意思的是,相比CRUD,多了一个replace事件。update 与 replace的区别在于

  A replace operation uses the update command, and consists of two stages:
  • Delete the original document with the documentKey and
  • Insert the new document using the same documentkey

  测试方法:启动两个Mongo shell,一个操作数据库,一个watch。为了方便区分,浅绿色背景为Operate,灰色背景为Watch

准备环境

  Operate

MongoDB Enterprise free-shard-0:PRIMARY> use engineering
switched to db engineering

  Watch

MongoDB Enterprise free-shard-0:PRIMARY> use engineering
switched to db engineering
order 2
MongoDB Enterprise free-shard-0:PRIMARY> cursor = db.users.watch()
assert: command failed: {
"operationTime" : Timestamp(1533888296, 2),
"ok" : 0,
"errmsg" : "cannot open $changeStream for non-existent database: engineering",
"code" : 26,
"codeName" : "NamespaceNotFound",
"$clusterTime" : {
"clusterTime" : Timestamp(1533888296, 2),
"signature" : {
"hash" : BinData(0,"fWTN4Kuv7cq9xCcC0vCF4AkTxuU="),
"keyId" : NumberLong("6563302068054917121")
}
}
} : aggregate failed

  从watch报错可以看出,只能对已经存在的db watch,因此可以先插入一条数据,创建对应的DB、Collection

  Operate

MongoDB Enterprise free-shard-0:PRIMARY> db.users.insert({'username': 'test1', age: 18, 'email':'test1@gmail.con'})
WriteResult({ "nInserted" : 1 })
  Watch
MongoDB Enterprise free-shard-0:PRIMARY> cursor = db.users.watch()
MongoDB Enterprise free-shard-0:PRIMARY> cursor.next()
2018-08-10T16:08:49.200+0800 E QUERY [thread1] Error: error hasNext: false :
DBCommandCursor.prototype.next@src/mongo/shell/query.js:853:1
@(shell):1:1

  此时已经创建好用于监听的cursor,此时还没有change event。

Insert

  Operate

MongoDB Enterprise free-shard-0:PRIMARY> db.users.insert({'username': 'test2', age: 19, 'email':'test2@gmail.con'})
WriteResult({ "nInserted" : 1 })

  Watch

MongoDB Enterprise free-shard-0:PRIMARY> cursor.next()
{
"_id" : {
"_data" : BinData(0,"glttSC0AAAADRmRfaWQAZFttSCb45nBxa/FSsABaEAQMcjq0rdpL+LTQHXFkm7J7BA==")
},
"operationType" : "insert",
"fullDocument" : {
"_id" : ObjectId("5b6d4826f8e670716bf152b0"),
"username" : "test2",
"age" : 19,
"email" : "test2@gmail.con"
},
"ns" : {
"db" : "engineering",
"coll" : "users"
},
"documentKey" : {
"_id" : ObjectId("5b6d4826f8e670716bf152b0")
}
}

replace

  Operate

MongoDB Enterprise free-shard-0:PRIMARY> db.users.update({username: "test1"}, {age: 19})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

  Watch

MongoDB Enterprise free-shard-0:PRIMARY> cursor.next()
{
"_id" : {
"_data" : BinData(0,"glttSSMAAAACRmRfaWQAZFttR+r45nBxa/FSrwBaEAQMcjq0rdpL+LTQHXFkm7J7BA==")
},
"operationType" : "replace",
"fullDocument" : {
"_id" : ObjectId("5b6d47eaf8e670716bf152af"),
"age" : 19
},
"ns" : {
"db" : "engineering",
"coll" : "users"
},
"documentKey" : {
"_id" : ObjectId("5b6d47eaf8e670716bf152af")
}
}

  可以看到,操作的时候使用的是db.collection.update,但change event 却是replace,原因在eplace-a-document-entirely中有介绍

If the <update> document contains only field:value expressions, then:

delete

  Operate

MongoDB Enterprise free-shard-0:PRIMARY> db.users.remove({ "_id" : ObjectId("5b6d47eaf8e670716bf152af")})
WriteResult({ "nRemoved" : 1 })

  watch

MongoDB Enterprise free-shard-0:PRIMARY> cursor.next()
{
"_id" : {
"_data" : BinData(0,"glttSfAAAAAFRmRfaWQAZFttR+r45nBxa/FSrwBaEAQMcjq0rdpL+LTQHXFkm7J7BA==")
},
"operationType" : "delete",
"ns" : {
"db" : "engineering",
"coll" : "users"
},
"documentKey" : {
"_id" : ObjectId("5b6d47eaf8e670716bf152af")
}
}

update

  Operate

MongoDB Enterprise free-shard-0:PRIMARY> db.users.insert({'username': 'test1', age: 18, 'email':'test1@gmail.con'})
WriteResult({ "nInserted" : 1 })
MongoDB Enterprise free-shard-0:PRIMARY> db.users.update({username: "test1"}, {$set: {age: 19}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

  Watch

MongoDB Enterprise free-shard-0:PRIMARY> cursor.next()
{
"_id" : {
"_data" : BinData(0,"glttSmQAAAAERmRfaWQAZFttSlz45nBxa/FSsgBaEAQMcjq0rdpL+LTQHXFkm7J7BA==")
},
"operationType" : "insert",
"fullDocument" : {
"_id" : ObjectId("5b6d4a5cf8e670716bf152b2"),
"username" : "test1",
"age" : 18,
"email" : "test1@gmail.con"
},
"ns" : {
"db" : "engineering",
"coll" : "users"
},
"documentKey" : {
"_id" : ObjectId("5b6d4a5cf8e670716bf152b2")
}
} MongoDB Enterprise free-shard-0:PRIMARY> cursor.next()
{
"_id" : {
"_data" : BinData(0,"glttSn0AAAABRmRfaWQAZFttSlz45nBxa/FSsgBaEAQMcjq0rdpL+LTQHXFkm7J7BA==")
},
"operationType" : "update",
"ns" : {
"db" : "engineering",
"coll" : "users"
},
"documentKey" : {
"_id" : ObjectId("5b6d4a5cf8e670716bf152b2")
},
"updateDescription" : {
"updatedFields" : {
"age" : 19
},
"removedFields" : [ ]
}
}

update fullDocument

  db.collection.watch 可以设置选项fullDocument参数,这个在change event:update的时候就可以返回对用documents的完整信息。

MongoDB Enterprise free-shard-0:PRIMARY> cursor = db.users.watch([], {fullDocument:'updateLookup'} )

  Operate

MongoDB Enterprise free-shard-0:PRIMARY> db.users.update({username: "test1"}, {$set: {age: 29}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

  Watch

MongoDB Enterprise free-shard-0:PRIMARY> cursor.next()
{
"_id" : {
"_data" : BinData(0,"glttS88AAAAERmRfaWQAZFttSlz45nBxa/FSsgBaEAQMcjq0rdpL+LTQHXFkm7J7BA==")
},
"operationType" : "update",
"fullDocument" : {
"_id" : ObjectId("5b6d4a5cf8e670716bf152b2"),
"username" : "test1",
"age" : 29,
"email" : "test1@gmail.con"
},
"ns" : {
"db" : "engineering",
"coll" : "users"
},
"documentKey" : {
"_id" : ObjectId("5b6d4a5cf8e670716bf152b2")
},
"updateDescription" : {
"updatedFields" : {
"age" : 29
},
"removedFields" : [ ]
}
}

resume change stream

  Operate

MongoDB Enterprise free-shard-0:PRIMARY> db.users.insert({"username": "test3", "age": 14})
WriteResult({ "nInserted" : 1 })
MongoDB Enterprise free-shard-0:PRIMARY> db.users.insert({"username": "test3", "age": 14})
WriteResult({ "nInserted" : 1 })
MongoDB Enterprise free-shard-0:PRIMARY> db.users.remove({"username": "test3"})
WriteResult({ "nRemoved" : 2 })

  Watch

MongoDB Enterprise free-shard-0:PRIMARY> ret = cursor.next()
{
"_id" : {
"_data" : BinData(0,"gltusJ4AAAABRmRfaWQAZFtusJ5f9Jy7Q0jALABaEAQMcjq0rdpL+LTQHXFkm7J7BA==")
},
"operationType" : "insert",
"fullDocument" : {
"_id" : ObjectId("5b6eb09e5ff49cbb4348c02c"),
"username" : "test3",
"age" : 14
},
"ns" : {
"db" : "engineering",
"coll" : "users"
},
"documentKey" : {
"_id" : ObjectId("5b6eb09e5ff49cbb4348c02c")
}
}
MongoDB Enterprise free-shard-0:PRIMARY> cursor.next()
{
"_id" : {
"_data" : BinData(0,"gltusKAAAAABRmRfaWQAZFtusJ9f9Jy7Q0jALQBaEAQMcjq0rdpL+LTQHXFkm7J7BA==")
},
"operationType" : "insert",
"fullDocument" : {
"_id" : ObjectId("5b6eb09f5ff49cbb4348c02d"),
"username" : "test3",
"age" : 14
},
"ns" : {
"db" : "engineering",
"coll" : "users"
},
"documentKey" : {
"_id" : ObjectId("5b6eb09f5ff49cbb4348c02d")
}
}
MongoDB Enterprise free-shard-0:PRIMARY> cursor.next()
{
"_id" : {
"_data" : BinData(0,"gltusK8AAAABRmRfaWQAZFtusJ5f9Jy7Q0jALABaEAQMcjq0rdpL+LTQHXFkm7J7BA==")
},
"operationType" : "delete",
"ns" : {
"db" : "engineering",
"coll" : "users"
},
"documentKey" : {
"_id" : ObjectId("5b6eb09e5ff49cbb4348c02c")
}
}
MongoDB Enterprise free-shard-0:PRIMARY> cursor.next()
{
"_id" : {
"_data" : BinData(0,"gltusK8AAAACRmRfaWQAZFtusJ9f9Jy7Q0jALQBaEAQMcjq0rdpL+LTQHXFkm7J7BA==")
},
"operationType" : "delete",
"ns" : {
"db" : "engineering",
"coll" : "users"
},
"documentKey" : {
"_id" : ObjectId("5b6eb09f5ff49cbb4348c02d")
}
}
Mongo

  Resume Watch

MongoDB Enterprise free-shard-0:PRIMARY> resume_cursor = db.users.watch([], {"resumeAfter": ret['_id']})
{ "_id" : { "_data" : BinData(0,"gltusKAAAAABRmRfaWQAZFtusJ9f9Jy7Q0jALQBaEAQMcjq0rdpL+LTQHXFkm7J7BA==") }, "operationType" : "insert", "fullDocument" : { "_id" : ObjectId("5b6eb09f5ff49cbb4348c02d"), "username" : "test3", "age" : 14 }, "ns" : { "db" : "5b6d2180df9db10e4ba91d60_engineering", "coll" : "users" }, "documentKey" : { "_id" : ObjectId("5b6eb09f5ff49cbb4348c02d") } }
{ "_id" : { "_data" : BinData(0,"gltusK8AAAABRmRfaWQAZFtusJ5f9Jy7Q0jALABaEAQMcjq0rdpL+LTQHXFkm7J7BA==") }, "operationType" : "delete", "ns" : { "db" : "5b6d2180df9db10e4ba91d60_engineering", "coll" : "users" }, "documentKey" : { "_id" : ObjectId("5b6eb09e5ff49cbb4348c02c") } }
{ "_id" : { "_data" : BinData(0,"gltusK8AAAACRmRfaWQAZFtusJ9f9Jy7Q0jALQBaEAQMcjq0rdpL+LTQHXFkm7J7BA==") }, "operationType" : "delete", "ns" : { "db" : "5b6d2180df9db10e4ba91d60_engineering", "coll" : "users" }, "documentKey" : { "_id" : ObjectId("5b6eb09f5ff49cbb4348c02d") } }
MongoDB Enterprise free-shard-0:PRIMARY> resume_cursor.next()
2018-08-11T17:49:13.127+0800 E QUERY [thread1] Error: error hasNext: false :
DBCommandCursor.prototype.next@src/mongo/shell/query.js:853:1
@(shell):1:1

  在resume_cursor中,resumeAfter的参数设置为了之前的watch document,在watch的时候会一次性返回已经被消费过的change event

Change Stream应用

DDIA cdc

  在Designing Data-Intensive Applications一书中,有一节Change Data Capture(cdc),讲述得就是复制集(replica set)中replication log的使用,对于MongoDB, replication log其实就是oplog。书中提到:

The problem with most databases’ replication logs is that they have long been considered to be an internal implementation detail of the database, not a public API.

  也就是说,应用(client)只能按照db的约束来使用db,而不是直接读取、解析replication log。但直接使用replic log直接用来创建serach index,cache,data warehouse。如下图所示:

   

  change data capture (CDC), which is the process of observing all data changes written to a database and extracting them in a form in which they can be replicated to other systems.

  CDC使得Search index, Data warehouse成为了派生数据系统(derived data systems),也可以理解为是DB数据的视图。另外,有意思的是,上图db、replication log、derived data system组成的系统看起来很像一个中心化复制集(single leader):DB是leader(Primary),derived data system(cache, data warehouse)是follower(Secondary)。

  Change stream应用前景非常广泛,在完美数据迁移-MongoDB Stream的应用 一文中,介绍了使用change stream来在服务化改造的时候做数据迁移,且给出了一个完整的示范。在USING MONGODB AS A REALTIME DATABASE WITH CHANGE STREAMS一文中,也结合NodeJs给出了一个简单的使用案列。

Change Stream实现与问题

官方对在Sharded Cluster上使用change stream有一些说明,可以参考文档,有以下几点值得注意:

(1)

To guarantee total ordering of changes, for each change notification the mongos checks with each shard to see if the shard has seen more recent changes.

  不管有没有数据变更,mongos都需要在所有shards上check,影响了change steam的响应时间。如果网络延时大,如geographically distributed shard,问题会更明显。如果数据变更特别频繁,那么Change stream可能跟不上变化

(2)  

For sharded collections, update operations with multi : true may cause any change streams opened against that collection to send notifications for orphaned documents.

  对于update操作,如果设置 multi:True,那么操作也可能在 orphaned documents.上执行,这样也会产生多余的change stream,应用可能需要处理这种情侣。BTW,ofphaned document是很令人头疼的问题。

   

  另外,MongoDB3.6只能针对单个collection进行watch,这样如果要关注多个collection或者多个db的write event时,需要分别建立连接进行watch,在 MongoDB 3.6 Change Streams: A Nest Temperature and Fan Control Use Case一文中提到这可能带来性能问题

  It’s estimated that after 1000 streams you will start to see very measurable performance drops

  不过,在MongoDB4.0中,可以在db,甚至cluster这个级别watch stream,对应用来说方便了很多,也避免了性能问题。

总结

  本文介绍了MongoDB Change Stream这一新特性,以及其在具体应用中需要注意到的一些问题,并基于MongoDB atlas进行了简单的尝试。毫无疑问,Change Stream是非常有前途的特性,能解决很多现在实现起来很别扭的问题。但是如果要用于线上业务,还需要大量的测试,尤其是容错性与性能。

References

MongoDB Change Stream

an-introduction-to-change-streams

免费试用MongoDB云数据库 (MongoDB Atlas)教程

Designing Data-Intensive Applications

完美数据迁移-MongoDB Stream

USING MONGODB AS A REALTIME DATABASE WITH CHANGE STREAMS

MongoDB Change Stream:简介、尝试与应用的更多相关文章

  1. MongoDB 变更流(Change Stream)介绍

    1. 什么是Change Stream Change Stream 是MongoDB用于实现变更追踪的解决方案,类似于关系数据库的触发器,但原理不完全相同: | | Change Stream | 触 ...

  2. MongoDB之TextSearch简介

    MongoDB之TextSearch简介  MongoDB支持对文本内容执行文本搜索操作,其提供了索引text index和查询操作$text来完成文本搜索功能.下面我们通过一个简单的例子来体验一下M ...

  3. .NET Core/.NET之Stream简介 Rx.NET 简介

    .NET Core/.NET之Stream简介   之前写了一篇C#装饰模式的文章提到了.NET Core的Stream, 所以这里尽量把Stream介绍全点. (都是书上的内容) .NET Core ...

  4. (原创)MongoDB之NoSQL简介

    Nosql简介1.1系统对数据的需求        Nosql[Nosql主要用途大数据处理]的全称为”not only sql”,为非关系型数据库[非关系型数据库就是关系型数据库的所有特点都没有了, ...

  5. MongoDB数据库的简介及安装

    一.MongoDB数据库简介 简介 MongoDB是一个高性能,开源,无模式的,基于分布式文件存储的文档型数据库,由C++语言编写,其名称来源取自“humongous”,是一种开源的文档数据库──No ...

  6. mongodb(一) NoSQL简介

    NoSQL简介   写在前面,本文就是学习的记录笔记,大部分内容都属于参考,分享给大家 关系与非关系数据库      那么应该了解下影响关系数据库性能的主要原因: 在关系型数据库中,导致性能欠佳的最主 ...

  7. Java8 Stream简介

    Stream是Java 8新增的重要特性, 它提供函数式编程支持并允许以管道方式操作集合. 流操作会遍历数据源, 使用管道式操作处理数据后生成结果集合, 这个过程通常不会对数据源造成影响. lambd ...

  8. .NET Core/.NET之Stream简介

    之前写了一篇C#装饰模式的文章提到了.NET Core的Stream, 所以这里尽量把Stream介绍全点. (都是书上的内容) .NET Core/.NET的Streams 首先需要知道, Syst ...

  9. [MongoDB教程] 1.简介

    MongoDB (名称来自「humongous (巨大无比的)」), 是一个可扩展的高性能,开源,模式自由,面向文档的NoSQL,基于 分布式 文件存储,由 C++ 语言编写,设计之初旨在为 WEB ...

随机推荐

  1. REST API设计指导——译自Microsoft REST API Guidelines(二)

    由于文章内容较长,只能拆开发布.翻译的不对之处,请多多指教. 另外:最近团队在做一些技术何架构的研究,视频教程只能争取周末多录制一点,同时预计在下周我们会展开一次直播活动,内容围绕容器技术这块. 所有 ...

  2. [五] JavaIO之InputStream OutputStream简介 方法列表说明

      InputStream 和 OutputStream 对于字节流的输入和输出 是作为协议的存在 所以有必要了解下这两个类提供出来的基本约定 这两个类是抽象类,而且基本上没什么实现,都是依赖于子类具 ...

  3. Django 系列博客(十一)

    Django 系列博客(十一) 前言 本篇博客介绍使用 ORM 来进行多表的操作,当然重点在查询方面. 创建表 实例: 作者模型:一个作者有姓名和年龄. 作者详细模型:把作者的详情放到详情表,包含生日 ...

  4. Qt实现半透明遮罩效果

    本文索引: 需求 原理 实现遮罩控件 遮罩的使用 需求 我们在显示一些模态对话框的时候,往往需要将对话框的背景颜色调暗以达到突出当前对话框的效果,例如: 对话框的父窗口除了标题栏以外的部分都变暗了,在 ...

  5. 为什么选择 Visual Studio Code

    为什么选择 Visual Studio Code 你在 VS Code 中找到的每个功能都完成一项出色的工作,构建了一些简单的功能集,包括语法高亮.智能补全.集成 git 和编辑器内置调试工具等,将使 ...

  6. Java编程的逻辑 (51) - 剖析EnumSet

    本系列文章经补充和完善,已修订整理成书<Java编程的逻辑>,由机械工业出版社华章分社出版,于2018年1月上市热销,读者好评如潮!各大网店和书店有售,欢迎购买,京东自营链接:http:/ ...

  7. Vim设置Tab宽度/替换Tab为空格

    用户配置: 在/home/you/.vimrc中添加以下代码后,重启vim即可实现按TAB产生4个空格:set ts=4  (注:ts是tabstop的缩写,设TAB宽4个空格)set expandt ...

  8. Linux平台安装MongoDB及使用Docker安装MongoDB

    一.Linux平台安装MongoDB MongoDB 提供了 linux 各发行版本 64 位的安装包,你可以在官网下载安装包. 下载地址:https://www.mongodb.com/downlo ...

  9. 小程序实践(十):textarea实现简单的编辑文本界面

    textarea是官方的原生组件,用于多行输入 简单的例子,监听文本内容.长度,以及设置最大可输入文本长度 wxml <view class='textarea-Style'> <t ...

  10. 记一次与iframe之间的抗争

    iframe这个标签之前了解过这个东西,知道它可以引入外来的网页,但是实际开发中没有用到过.这一次有一个需求是说准备要在网页中嵌套另外一个网站,用iframe这个标签,让我测试一下这个可不可以在自己的 ...