转自:https://pgloader.io/blog/continuous-migration/

After having been involved in many migration projects over the last 10 years, I decided to publish the following White Paper in order to share my learnings.

The paper is titled Migrating to PostgreSQL, Tools and Methodology and details the Continuous Migration approach. It describes how to migrate from another relational database server technology to PostgreSQL. The reasons to do so are many, and first among them is often the licensing model.

While this is awesome news for this particular project, it is still pretty rare that having to change your connection string is all you need to do to handle a migration!

If you’re less lucky than CommaFeed, you might want to prepare for a long running project and activate the necessary resources, both servers and people availability. Even if you’re using an ORM and never wrote a SQL query yourself, your application is still sending SQL queries to the database, and maybe not all of them can be written in a PostgreSQL compatible way.

Continuous Migration

Migrating to PostgreSQL is a project in its own right, and requires a good methodology in order to be successful. One important criteria for success is being able to deliver on time and on budget. This doesn’t mean that the first budget and timing estimates are going to be right — it means that the project is kept under control.

The best methodology to keep things under control is still a complex research area in our field of computer science, though some ideas seem to keep being considered as the basis for enabling such a goal: split projects in smaller chunks and allow incremental progress towards the final goal.

To make it possible to split a migration project into chunks and allow for incremental progress, we are going to implement continuous migration:

  • Continuous migration is comparable to continuous integration and continuous deployments, or CI/CD.

  • The main idea is to first setup a target PostgreSQL environment and then use it everyday as developers work on porting your software to this PostgreSQL platform.

  • As soon as a PostgreSQL environment exists, it’s possible to fork a CI/CDsetup using the PostgreSQL branch of your code repository.

  • In parallel to porting the code to PostgreSQL, it’s then possible for the ops and DBA teams to make the PostgreSQL environment production ready by implementing backups, automated recovery, high availability, load balancing, and all the usual ops quality standards.

Setting up a continuous migration environment does more than allow for progress to happen in small chunks and in parallel with other work — it also means that your team members all have an opportunity to familiarize themselves with the new piece of technology that PostgreSQL might represent for them.

PostgreSQL Architecture

In order to be able to implement the Continuous Migration methodology, the first step involves setting up a PostgreSQL environment. A classic anti-pattern here is to simply host PostgreSQL on a virtual machine and just use that, putting off production-ready PostgreSQL architecture until later.

High Availability

A very classic PostgreSQL architecture to begin with involves WAL archiving and a standby server, and it looks like this:

Automated Recovery

In the previous schema you can see the generic terms archive_command andrestore_command. Those PostgreSQL configuration hooks allow one to implement WAL archiving and point in time recovery thanks to management of an archive of the change log of your production database service.

Now, rather than implementing those crucial scripts on your own, you can use production ready WAL management applications such as pgbarman or pgbackrest. If you’re using cloud services, have a look at WAL-e if you want to use Amazon S3.

Don’t roll your own PITR script. It’s really easy to do it wrong, and what you want is an entirely automated recovery solution, where most projects would only implement the backup parts. A backup that you can’t restore is useless, so you need to implement a fully automated recovery solution. The projects listed above just do that.

PostgreSQL Standby Servers

Once you have an automated recovery solution in place, you might want to reduce the possible downtime by having a standby server ready to take over.

To understand all the details about the setup, read all of the PostgreSQL documentation about high availability, load balancing, and replication and then read about logical replication.

Note that the PostgreSQL documentation is best in class. Patches that add or modify PostgreSQL features are only accepted when they also update all the affected documentation. This means that the documentation is always up-to-date and reliable. So when using PostgreSQL, get used to reading the official documentation a lot.

If you’re not sure about what to do now, setup a PostgreSQL Hot Standby physical replica by following the steps under hot standby. It looks more complex than it is. All you need to do is the following:

  1. Check your postgresql.conf and allow for replication
  2. Open replication privileges on the network in pg_hba.conf
  3. Use pg_basebackup to have a remote copy of your primary data
  4. Start your replica with a setup that connects to the primary

A PostgreSQL hot standby server replays its primary write ahead log, applying the same binary file level modifications as the primary itself, allowing the standby to be a streaming copy of the primary server. Also the PostgreSQL hot standby server is open for read-only SQL traffic.

It’s called a hot standby because not only is the server open for read-only SQL queries — this read-only traffic also doesn’t get interrupted in case of a standby promotion!

Load Balancing and Fancy Architectures

Of course it’s possible to setup more complex PostgreSQL Architectures. Starting with PostgreSQL 10 you can use Logical Replication. It’s easy to setup, as seen in the Logical Replication Quick Setup part of the PostgreSQL documentation.

Do you need such a setup to get started migrating your current application to PostgreSQL tho? Maybe not, your call.

Continuous Integration, Continuous Delivery

That’s how we do it now right? Your whole testing and delivery pipeline is automated using something like a Jenkins or a Travis setup, or something equivalent. Or even something better. Well then, do the same thing for your migration project.

So we now are going to setup Continuous Migration to back you up for the duration of the migration project.

Nightly Migration of the Production Data

Chances are that once your data migration script is tweaked for all the data you’ve gone through, some new data is going to show up in production that will defeat your script.

To avoid data related surprises on D-day, just run the whole data migration script from the production data every night, for the whole duration of the project. You will have such a good track record in terms of dealing with new data that you will fear no surprises. In a migration project, surprises are seldom good.

“If it wasn’t for bad luck, I wouldn’t have no luck at all.”

Albert King, Born Under a Bad Sign

We see how to implement this step in details in the Migrating to PostgreSQL, Tools and MethodologyWhite Paper. Also, the About page of this website contains more detailed information about how pgloader implements fully automated database migrations, from MySQL, MS SQL or SQLite live connections.

Migrating Code and SQL Queries

Now that you have a fresh CI/CD environment with yesterday’s production data every morning, it’s time to rewrite those SQL queries for PostgreSQL. Depending on the RDBMS your are migrating from, the differences in the SQL engines are either going to be mainly syntactic sugar, or sometimes there will be completely missing features.

Take your time and try to understand how to do things in PostgreSQL if you want to migrate your application rather than just porting it, that is making it run the same feature set on top of PostgreSQL rather than your previous choice.

My book Mastering PostgreSQL in Application Development is a good companion when learning how to best use PostgreSQL and its advanced SQL feature set.

Continuous Migration

Migrating from one database technology to PostgreSQL requires solid project methodology. In this document we have shown a simple and effective database migration method, named Continuous Migration:

  1. Setup your target PostgreSQL architecture
  2. Fork a continuous integration environment that uses PostgreSQL
  3. Migrate the data over and over again every night, from your current production RDBMS
  4. As soon as the CI is all green using PostgreSQL, schedule the D-day
  5. Migrate without any unpleasant suprises… and enjoy!

This method makes it possible to break down a huge migration effort into smaller chunks, and also to pause and resume the project if need be. It also ensures that your migration process is well understood and handled by your team, drastically limiting the number of surprises you may otherwise encounter on migration D-day.

The third step isn’t always as easy to implement as it should be, and that’s why the pgloader open source project exists: it implements fully automated database migrations!

 
 
 
 

CONTINUOUS MIGRATION的更多相关文章

  1. pgloader 方便的数据迁移工具

    pgloader 是一个支持多种数据源迁移到pg 数据库的工具,高性能,使用灵活同时作者 也提供了docker 版本的镜像,今年3月份使用此工具的时候,发现好久都没更新了,但是 最近作者有了新版本的发 ...

  2. 写给.NET开发者的数据库Migration方案

    微软给我们提供了一种非常好用的数据库迁移方案,但是我发现周围的同学用的并不多,所以我还是想把这个方案整理一下..NET选手看过来,特别是还在通过手工执行脚本来迁移数据库的同学们,当然你也可以选择EF的 ...

  3. EF Core 数据库迁移(Migration)

    工具与环境介绍 1.开发环境为vs 2015 2.mysql EF Core支持采用  Pomelo.EntityFrameworkCore.MySql   源代码地址(https://github. ...

  4. Database first with EntityFramework (Migration)安装和升级

    最近看了国外几个项目,发现用EntityFramework做Code First的项目现在很流行. 最让我有兴趣的一个功能则是,EntityFramework对于数据库的安装和升级的无缝完美支持,且很 ...

  5. Laravel使用笔记 —— migration

    在使用 php artisan make:migration 创建migration时,可用 --path 指定创建migration文件的路径, 如果在执行的 php artisan migrate ...

  6. SVN Server for Migration

    SVN Server: http://mxsuse01/svn/repos/erp/Oracle_EMS Report SVN (Put to SVN Sort) 1. *.RDF 2. *CP.LD ...

  7. ABP Migration(数据库迁移)

    今天准备说说EntityFramework 6.0+,它与我之前所学的4.0有所区别,自从4.1发布以来,code first 被许多人所钟爱,Dbcontext API也由此时而生.早在学校的时候就 ...

  8. migration integer limit option

    https://gist.github.com/stream7/1069589 :limit Numeric Type Column Size Max value 1 tinyint 1 byte 1 ...

  9. MyEclipse Project Migration功能中文简单介绍

    前端时间,我对myEclispe的project Migration产生了疑问,也不知道是干啥用的.然后百度之,翻译结果是项目迁移,再次百度其他人对这个的经验,没想到百度到的没多少,甚至都没有说明这个 ...

随机推荐

  1. Toy Factory

    Factory is a design pattern in common usage. Please implement a ToyFactory which can generate proper ...

  2. linux安装jdk、tomcat、maven、mysql

    安装SZ rz与Gcc 首先需要tomcat的jar包,打算rz上去,发现没有安装 ./configure的时候发现缺少gcc和cc 安装解决: 再次执行成功安装了sz和rz 创建软链接然后就可以使用 ...

  3. 转--HC05-两个蓝牙模块间的通信

    示例蓝牙: 蓝牙A地址:3014:10:271614 蓝牙B地址:2015:2:120758 //============================================= 步骤: 1 ...

  4. delphi reintroduce作用

    当在子类中重载或者重新声明父类的虚方法时,使用  reintroduce   关键字告知编译器,可以消除警告信息. 如: TParent = class procedure proc;virtual; ...

  5. html 网页代码大全,总结,使用

    )贴图:<img src="图片地址"> )加入连接:<a href="所要连接的相关地址">写上你想写的字</a> )贴图 ...

  6. 获取Map的key和value的两种方法

    //使用迭代器,获取key; /*Iterator<String> iter = map.keySet().iterator(); while(iter.hasNext()){ Strin ...

  7. Spring Boot 揭秘与实战(五) 服务器篇 - Tomcat 启用 HTTPS

    文章目录 1. 生成证书 2. 配置 HTTPS 支持 3. 启动与测试 4. 源代码 Spring Boot 内嵌的 Tomcat 服务器可以启用 HTTPS 支持. 生成证书 使用第三方 CA 证 ...

  8. Java包、类、数据类型、表达式和标识符

    1.基本数据类型 类型名称 类型长度 取值范围 byte 8位(1字节) -128~127 short 16位(2字节) -32768~32767 int 32位(4字节) -2147483648~2 ...

  9. Python 笔试 —— 效率与优雅

    1. 效率 字符串拼接: 加号拼接字符串将造成对象的创建和垃圾的回收: 使用字符串的 join 方法对尤其是循环中的字符串进行拼接(先将不断出现的字符串 append 到 一个 list 中,再进行 ...

  10. 【论文 PPT】 【转】Human-level control through deep reinforcement learning(DQN)

    最近在学习强化学习的东西,在网上发现了一个关于DQN讲解的PPT,感觉很是不错,这里做下记录,具体出处不详. ============================================= ...