转自:https://severalnines.com/blog/top-pg-clustering-ha-solutions-postgresql

If your system relies on PostgreSQL databases and you are looking for clustering solutions for HA, we want to let you know in advance that it is a complex task, but not impossible to achieve.

We are going to discuss some solutions, from which you will be able to choose taking into account your requirements on fault tolerance.

PostgreSQL does not natively support any multi-master clustering solution, like MySQL or Oracle do. Nevertheless, there are many commercial and community products that offer this implementation, along with others such as replication or load balancing for PostgreSQL.

For a start, let's review some basic concepts:

What is High Availability?

It is the amount of time that a service is available, and is usually defined by the business.

Redundancy is the basis for high availability; in the event of an incident, we can continue to operate without problems.

Continuous Recovery

If and when an incident occurs, we have to restore a backup and then apply the wal logs; The recovery time would be very high and we would not be talking about high availability.

However, if we have the backups and the logs archived in a contingency server, we can apply the logs as they arrive.

If the logs are sent and applied every 1 minute, the contingency base would be in a continuous recovery, and would have an outdated state to the production of at most 1 minute.

Standby databases

The idea of a standby database is to keep a copy of a production database that always has the same data, and that is ready to be used in case of an incident.

There are several ways to classify a standby database:

By the nature of the replication:

  • Physical standbys: Disk blocks are copied.
  • Logical standbys: Streaming of the data changes.

By the synchronicity of the transactions:

  • Asynchronous: There is possibility of data loss.
  • Synchronous: There is no possibility of data loss; The commits in the master wait for the response of the standby.

By the usage:

  • Warm standbys: They do not support connections.
  • Hot standbys: Support read-only connections.

Clusters

A cluster is a group of hosts working together and seen as one.

This provides a way to achieve horizontal scalability and the ability to process more work by adding servers.

It can resist the failure of a node and continue to work transparently.

There are two models depending on what is shared:

  • Shared-storage: All nodes access the same storage with the same information.
  • Shared-nothing: Each node has its own storage, which may or may not have the same information as the other nodes, depending on the structure of our system.

Let's now review some of the clustering options we have in PostgreSQL.

Distributed Replicated Block Device

DRBD is a Linux kernel module that implements synchronous block replication using the network. It actually does not implement a cluster, and does not handle failover or monitoring. You need complementary software for that, for example Corosync + Pacemaker + DRBD.

Example:

  • Corosync: Handles messages between hosts.
  • Pacemaker: Starts and stops services, making sure they are running only on one host.
  • DRBD: Synchronizes the data at the level of block devices.

ClusterControl

ClusterControl is an agentless management and automation software for database clusters. It helps deploy, monitor, manage and scale your database server/cluster directly from its user interface.

ClusterControl is able to handle most of the administration tasks required to maintain database servers or clusters.

With ClusterControl you can:

  • Deploy standalone, replicated or clustered databases on the technology stack of your choice.
  • Automate failovers, recovery and day to day tasks uniformly across polyglot databases and dynamic infrastructures.
  • You can create full or incremental backups and schedule them.
  • Do unified and comprehensive real time monitoring of your entire database and server infrastructure.
  • Easily add or remove a node with a single action.

On PostgreSQL, if you have an incident, your slave can be promoted to master status automatically.

It is a very complete tool, that comes with a free community version (which also includes free enterprise trial).

Node Stats View
Cluster Nodes VieDownload the Whitepaper Tod

Rubyrep

Solution of asynchronous, multimaster, multiplatform replication (implemented in Ruby or JRuby) and multi-DBMS (MySQL or PostgreSQL).

Based on triggers, it does not support DDL, users or grants.

The simplicity of use and administration is its main objective.

Some features:

  • Simple configuration
  • Simple installation
  • Platform independent, table design independent.

Pgpool II

It is a middleware that works between PostgreSQL servers and a PostgreSQL database client.

Some features:

  • Connection pool
  • Replication
  • Load balancing
  • Automatic failover
  • Parallel queries

It can be configured on top of streaming replication.

Bucardo

Asynchronous cascading master-slave replication, row-based, using triggers and queueing in the database andasynchronous master-master replication, row-based, using triggers and customized conflict resolution.

Bucardo requires a dedicated database and runs as a Perl daemon that communicates with this database and all other databases involved in the replication. It can run as multimaster or multislave.

Master-slave replication involves one or more sources going to one or more targets. The source must be PostgreSQL, but the targets can be PostgreSQLMySQL, Redis, Oracle, MariaDB, SQLite, or MongoDB.

Some features:

  • Load balancing
  • Slaves are not constrained and can be written
  • Partial replication
  • Replication on demand (changes can be pushed automatically or when desired)
  • Slaves can be "pre-warmed" for quick setup

Drawbacks:

  • Cannot handle DDL
  • Cannot handle large objects
  • Cannot incrementally replicate tables without a unique key
  • Will not work on versions older than Postgres 8

Postgres-XC

Postgres-XC is an open source project to provide a write-scalable, synchronous, symmetric and transparent PostgreSQL cluster solution. It is a collection of tightly coupled database components which can be installed in more than one hardware or virtual machines.

Write-scalable means Postgres-XC can be configured with as many database servers as you want and handle many more writes (updating SQL statements) compared to what a single database server can do.

You can have more than one database server that clients connect to which provides a single, consistent cluster-wide view of the database.

Any database update from any database server is immediately visible to any other transactions running on different masters.

Transparent means you do not have to worry about how your data is stored in more than one database server internally.

You can configure Postgres-XC to run on multiple servers. Your data is stored in a distributed way, that is, partitioned or replicated, as chosen by you for each table. When you issue queries, Postgres-XC determines where the target data is stored and issues corresponding queries to servers containing the target data.

Citus

Citus is a drop-in replacement for PostgreSQL with built-in high availability features such as auto-sharding and replication. Citus shards your database and replicates multiple copies of each shard across the cluster of commodity nodes. If any node in the cluster becomes unavailable, Citus transparently redirects any writes or queries to one of the other nodes which houses a copy of the impacted shard.

Some features:

  • Automatic logical sharding
  • Built-in replication
  • Data-center aware replication for disaster recovery
  • Mid-query fault tolerance with advanced load balancing

You can increase the uptime of your real-time applications powered by PostgreSQL and minimize the impact of hardware failures on performance. You can achieve this with built-in high availability tools minimizing costly and error-prone manual intervention.

PostgresXL

It is a shared nothing, multimaster clustering solution which can transparently distribute a table on a set of nodes and execute queries in parallel of those nodes. It has an additional component called Global Transaction Manager (GTM) for providing globally consistent view of the cluster. The project is based on the 9.5 release of PostgreSQL. Some companies, such as 2ndQuadrant, provide commercial support for the product.

PostgresXL is a horizontally scalable open source SQL database cluster, flexible enough to handle varying database workloads:

  • OLTP write-intensive workloads
  • Business Intelligence requiring MPP parallelism
  • Operational data store
  • Key-value store
  • GIS Geospatial
  • Mixed-workload environments
  • Multi-tenant provider hosted environments

Components:

  • Global Transaction Monitor (GTM): The Global Transaction Monitor ensures cluster-wide transaction consistency.
  • Coordinator: The Coordinator manages the user sessions and interacts with GTM and the data nodes.
  • Data Node: The Data Node is where the actual data is stored.

Conclusion

Related resources

There are many more products to create our high availability environment for PostgreSQL, but you have to be careful with:

  • New products, not sufficiently tested
  • Discontinued projects
  • Limitations
  • Licensing costs
  • Very complex implementations
  • Unsafe solutions

You must also take into account your infrastructure. If you have only one application server, no matter how much you have configured the high availability of the databases, if the application server fails, you are inaccessible. You must analyze the single points of failure in the infrastructure well and try to solve them.

Taking these points into account, you can find a solution that adapts to your needs and requirements, without generating headaches and being able to implement your high availability cluster solution. Go ahead and good luck!

 
 
 
 

Top PG Clustering HA Solutions for PostgreSQL的更多相关文章

  1. 使用patroni 解决hasura graphql-engine pg 数据库ha的问题

    环境准备 机器pg 数据库地址修改为haproxy 的ip地址,端口是haproxy的tcp 端口,配置比较简单 hasura graphql-engine docker-compose versio ...

  2. Neutron分析(7)—— neutron-l3-agent HA solutions

    1. keepalived vrrp/conntrackd High availability features will be implemented as extensions or driver ...

  3. 使用 bitnami/postgresql-repmgr 镜像快速设置 PostgreSQL HA

    什么是 PostgreSQL HA? 此 PostgreSQL 集群解决方案包括 PostgreSQL 复制管理器(replication manager),这是一种用于管理 PostgreSQL 集 ...

  4. PostgreSQL Q&A: Building an Enterprise-Grade PostgreSQL Setup Using Open Source Tools

    转自:https://www.percona.com/blog/2018/10/19/postgresql-building-enterprise-grade-setup-with-open-sour ...

  5. PostgreSQL相关的软件,库,工具和资源集合

    PostgreSQL相关的软件,库,工具和资源集合. 备份 wal-e - Simple Continuous Archiving for Postgres to S3, Azure, or Swif ...

  6. pg 资料大全1

    https://github.com/ty4z2008/Qix/blob/master/pg.md?from=timeline&isappinstalled=0 PostgreSQL(数据库) ...

  7. PostgreSQL常用插件收集

    hexdump -C 数据表文件 -- 查看表文件中数据. pg_stat_statements pgcompacttable -- 在减少锁的情况下,清理表和索引的老空间. pg_repack--P ...

  8. CloudStack + KVM + HA

    KVM高可用性CS4.2暂时没有实现 The Linux Kernel Virtual Machine (KVM) is a very popular hypervisor choice amongs ...

  9. PostgreSQL缓存

    目录[-] pg_buffercache pgfincore pg_prewarm dstat Linux ftools 使用pg_prewarm预加载关系/索引: pgfincore 输出: 怎样刷 ...

随机推荐

  1. 3.2 Bochs

    Bochs 工具 bochs: bochs ubuntu安装配置Bochs 安装bochs sudo apt-get install bochs bochs-x 创建工程目录 创建工程目录并进入 新建 ...

  2. Linux命令----su(切换用户)以及passwd(修改用户密码)

    一.su命令登录root 用户在使用telnet命令可以远程登录,但不可以登录root,这样就需要使用su命令来登录root用户. telnet登录(不能登录root)--- 1.启动终端 输入 te ...

  3. 线程queue与进程queue

    进程queue: from multiprocessing import Queue,Process def func(qq): qq.put('function:我要放数据,给你来取...') if ...

  4. :复合模式:duck

    #ifndef __QUAKEABLE_H__ #define __QUAKEABLE_H__ #include <iostream> #include <vector> us ...

  5. 运算类实现 及 GNU Makefile基本结构

    1.运算类的实现,代码如下:  (1)operator.cpp #include<iostream> #include "operator.h" using names ...

  6. winserver 搭建 Citrix License 许可服务器

    1.  申请许可证 Citrix XenApp_XenDesktop7.6和XenServer 6.5申请许可证的步骤是一致的,由于之前我已经申请过XenApp_XenDesktop的许可证,本次以X ...

  7. 【PyImageSearch】Ubuntu16.04使用OpenCV3.3.0实现图像分类

    这篇博文将会展示如何采用一个预训练的深度学习网络(模型)在ImageNet的数据集并把它当作输入图像. 首先说明,运行环境为Ubuntu16.04(或者MacOS),windows暂不支持,已经编译好 ...

  8. Configuring Ubuntu for deep learning with Python in Ubuntu16.04

    博主最近浏览到一个网站PyImageSearch,看到里面的项目还不错,就顺手配置一下环境,试着去跑下里面的模型. 首先,需要配置好需要运行模型的环境,其实主要的步骤分为以下三步: 1. 安装Ubun ...

  9. SQL--数据表--基本操作

    表操作 表与字段是密不可分的. 新增数据表 Create table [if not exists] 表名(字段名字 数据类型,字段名字 数据类型 --最后一行不需要逗号) [表选项] ; if no ...

  10. [从Paxos到ZooKeeper][分布式一致性原理与实践]<二>一致性协议[Paxos算法]

    Overview 在<一>有介绍到,一个分布式系统的架构设计,往往会在系统的可用性和数据一致性之间进行反复的权衡,于是产生了一系列的一致性协议. 为解决分布式一致性问题,在长期的探索过程中 ...