MariaDB Galera Cluster是什么?  


Galera Cluster是由第三方公司Codership所研发的一套免费开源的集群高可用方案,实现了数据零丢失,官网地址为http://galeracluster.com/

其在MySQLInnoDB存储引擎基础上打了wrep(虚拟全同步复制),Percona/MariaDB已捆绑在各自的发行版本中

MariaDB Galera Cluster是MariaDB同步多主机集群。它仅支持XtraDB/InnoDB存储引擎(虽然有对MyISAM实验支持,具体看wsrep_replicate_myisam系统变量)

MariaDB Galera Cluster主要功能

1.同步复制

2.真正的multi-master,即所有节点可以同时读写数据库

3. 自动的节点成员控制,失效节点自动被清除

4. 新节点加入数据自动复制

5.真正的并行复制,行级

6.用户可以直接连接集群,使用感受上与MySQL完全一致

优势

  • 因为是多主,所以不存在Slavelag(延迟)
  • 不存在丢失事务的情况
  • 同时具有读和写的扩展能力
  • 更小的客户端延迟
  • 节点间数据是同步的,而Master/Slave模式是异步的,不同slave上的binlog可能是不同的

缺点

  • 加入新节点时开销大,需要复制完整的数据
  • 不能有效地解决写扩展的问题,所有的写操作都发生在所有的节点
  • 有多少个节点,就有多少份重复的数据
  • 由于事务提交需要跨节点通信,即涉及分布式事务操作,因此写入会比主从复制慢很多,节点越多,写入越慢,死锁和回滚也会更加频繁
  • 对网络要求比较高,如果网络出现波动不稳定,则可能会造成两个节点失联,Galera Cluster集群会发生脑裂,服务将不可用

存在局限

  1. 仅支持InnoDB/XtraDB存储引擎,任何写入其他引擎的表,包括mysql.*表都不会被复制。但是DDL语句可以复制,但是insert into mysql.user(MyISAM存储引擎)之类的插入数据不会被复制
  2. Delete操作不支持没有主键的表,因为没有主键的表在不同的节点上的顺序不同,如果执行select … limit …将出现不同的结果集
  3. LOCK/UNLOCK TABLES/FLUSH TABLES WITH READ LOCKS不支持单表所锁,以及锁函数GET_LOCK()、RELEASE_LOCK(),但FLUSH TABLES WITH READ LOCK支持全局表锁
  4. General Query Log日志不能保存在表中,如果开始查询日志,则只能保存到文件中
  5. 不能有大事务写入,不能操作wsrep_max_ws_rows=131072(行),且写入集不能超过wsrep_max_ws_size=1073741824(1GB),否则客户端直接报错
  6. 由于集群是乐观锁并发控制,因此,在commit阶段会有事务冲突发生。如果两个事务在集群中的不同节点上对同一行写入并提交,则失败的节点将回滚,客户端返回死锁报错
  7. XA分布式事务不支持Codership Galera Cluster,在提交时可能会回滚
  8. 整个集群的写入吞吐量取决于最弱的节点限制,集群要使用同一的配置

服务器环境说明

系统版本 MariaDB版本 主机IP 节点名称
CentOS 7.9 mariadb-10.3.27 192.168.40.50 MariaDB-Node1
CentOS 7.9 mariadb-10.3.27 192.168.40.60 MariaDB-Node2
CentOS 7.9 mariadb-10.3.27 192.168.40.70 MariaDB-Node3

环境准备

[root@MariaDB-Node1 ~]# sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
[root@MariaDB-Node1 ~]# setenforce 0
setenforce: SELinux is disabled
[root@MariaDB-Node1 ~]# systemctl stop firewalld.service
[root@MariaDB-Node1 ~]# systemctl disable firewalld.service
[root@MariaDB-Node1 ~]# vi /etc/security/limits.conf * soft nofile 65536
* hard nofile 65536 [root@MariaDB-Node1 ~]#sysctl -p

安装MariaDB

参考我之前的博客安装,采用二进制安装

https://www.cnblogs.com/mike666/p/13666469.html

三台数据库主机同时安装  MariaDB 10.3.27数据库

备注:从MariaDB 10.1.20 版本开始,Galera Cluster就已经包含在MariaDB包里面了,不需要单独部署MariaDB-Galera-server 和galera 包

安装 Galera 依赖包

安装依赖包
[root@MariaDB-Node1 ~]# yum -y install lsof cmake gcc gcc-c++ autoconf automake zlib* libxml* \
ncurses ncurses-devel libtool libtool-ltdl-devel* make bison bison-devel \
openssl-devel libevent-devel \
libaio libaio-devel pam-devel boost boost-devel valgrind-devel \
libnl-devel popt-devel popt-static bzr nmap-* check-devel 安装Scons
[root@MariaDB-Node1 ~]# wget http://prdownloads.sourceforge.net/scons/scons-2.5.0.tar.gz
[root@MariaDB-Node1 ~]# tar zxvf scons-2.5.0.tar.gz -C /usr/local/ && rm -rf scons-2.5.0.tar.gz
[root@MariaDB-Node1 ~]# cd /usr/local/scons-2.5.0/
[root@MariaDB-Node1 /usr/local/scons-2.5.0]# python setup.py install
[root@MariaDB-Node1 /usr/local/scons-2.5.0]# cd
[root@MariaDB-Node1 ~]# scons -v
SCons by Steven Knight et al.:
script: v2.5.0.rel_2.5.0:3544:95d356f188a3[MODIFIED], 2016/04/09 14:38:50, by bdbaddog on ubuntu1404-32bit
engine: v2.5.0.rel_2.5.0:3544:95d356f188a3[MODIFIED], 2016/04/09 14:38:50, by bdbaddog on ubuntu1404-32bit
engine path: ['/usr/lib/scons-2.5.0/SCons']
Copyright (c) 2001 - 2016 The SCons Foundation
[root@MariaDB-Node1 ~]#

三台都要安装这个依赖包,不然启动集群可能会报错

修改配置文件my.cnf

下面我们开始配置MariaDB Galera Cluster,分别修改MariaDB Galera集群的每个节点上的/etc/my.cnf文件,具体每个节点的内容如下:

192.168.40.50 配置文件修改:

[root@MariaDB-Node1 ~]# vim /etc/my.cnf

[client]
port = 3306
socket = /tmp/mysql.sock [mysqld]
user = mysql
datadir = /data/mysql/data
pid-file = /data/mysql/mysql.pid
server-id = 50
relay_log =/data/logs/mysql/mysql_relay.log
bind-address = 0.0.0.0 tmpdir=/data/tmp init_connect ='SET NAMES utf8'
character-set-server = utf8
skip-name-resolve
back_log = 300 max_connections = 8019
max_connect_errors = 1024000
open_files_limit = 65535
table_open_cache = 2048
max_allowed_packet = 50M
max_heap_table_size = 512M
tmp_table_size = 256M read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M
key_buffer_size = 512M
thread_cache_size = 128 query_cache_type = 2
query_cache_size = 128M
query_cache_limit = 5M
thread_stack = 192k
ft_min_word_len = 4 log_bin = /data/mysql/mysql_bin.log
binlog_format = ROW
expire_logs_days = 7 log_error = /data/logs/mysql/mysql_error.log
slow_query_log = 1
long_query_time = 1
log_slow_verbosity=query_plan
slow_query_log_file = /data/logs/mysql/mysql_slow.log
performance_schema = 0 skip-external-locking #跳过外部锁定,避免external locking bulk_insert_buffer_size = 64M
myisam_sort_buffer_size = 128M
myisam_max_sort_file_size = 10G
myisam_repair_threads = 1 default_storage_engine = InnoDB
innodb_file_per_table = 1
innodb_open_files = 500
innodb_buffer_pool_size = 2048M
innodb_write_io_threads = 8
innodb_read_io_threads = 8
innodb_thread_concurrency = 16
innodb_purge_threads = 1
innodb_flush_log_at_trx_commit = 1
innodb_log_buffer_size = 8M
innodb_log_file_size = 256M
innodb_log_files_in_group = 3
innodb_max_dirty_pages_pct = 90
innodb_lock_wait_timeout = 10
innodb_autoinc_lock_mode = 2
innodb_doublewrite = 1
###New ADD
innodb_rollback_on_timeout = 1
innodb_force_recovery=0
interactive_timeout = 28800
wait_timeout = 120 [galera]
wsrep_provider=/usr/local/mysql/lib/galera/libgalera_smm.so # Galera library 安装包自带有
wsrep_cluster_address="gcomm://192.168.40.50,192.168.40.60,192.168.40.70" # Galera cluster 集群地址
wsrep_cluster_name='Galera_Cluster' # Galera集群名称
wsrep_node_address='192.168.40.50' # 该节点的地址
wsrep_node_name='MariaDB-Node1' # 该节点的主机名
wsrep_slave_threads = 8 #开启并行复制线程,根据CPU核数设置
wsrep_log_conflicts=1
wsrep_on=ON #开启全同步复制模式
wsrep_provider_options="gcache.size=2G" #同步复制缓冲池
wsrep_forced_binlog_format=ROW
wsrep_drupal_282555_workaround=1
wsrep_max_ws_size=2147483647
wsrep_sst_method=rsync #拷贝模式
wsrep_sst_auth=galera:galera #galera集群认证用户:密码
query_cache_size=0 #关闭查询缓存
default_storage_engine=InnoDB #Mariadb存储引擎
#wsrep_dirty_reads=0 #当值是1的时候这个节点是只读节点
#wsrep_sst_method=[mysqldump|xtrabackup|xtrabackup-v2|rsync|rsync_wan]
#wsrep_sst_method=xtrabackup-v2 [mysqldump]
quick
max_allowed_packet = 16M [myisamchk]
key_buffer_size = 512M
sort_buffer_size = 8M
read_buffer = 4M
write_buffer = 4M [root@MariaDB-Node1 ~]#

192.168.40.60配置文件修改:

[root@MariaDB-Node2 ~]# vim /etc/my.cnf

[client]
port = 3306
socket = /tmp/mysql.sock [mysqld]
user = mysql
datadir = /data/mysql/data
pid-file = /data/mysql/mysql.pid
server-id = 60
relay_log =/data/logs/mysql/mysql_relay.log
bind-address = 0.0.0.0 tmpdir=/data/tmp init_connect ='SET NAMES utf8'
character-set-server = utf8
skip-name-resolve
back_log = 300 max_connections = 8019
max_connect_errors = 1024000
open_files_limit = 65535
table_open_cache = 2048
max_allowed_packet = 50M
max_heap_table_size = 512M
tmp_table_size = 256M read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M
key_buffer_size = 512M
thread_cache_size = 128 query_cache_type = 2
query_cache_size = 128M
query_cache_limit = 5M
thread_stack = 192k
ft_min_word_len = 4 log_bin = /data/mysql/mysql_bin.log
binlog_format = ROW
expire_logs_days = 7 log_error = /data/logs/mysql/mysql_error.log
slow_query_log = 1
long_query_time = 1
log_slow_verbosity=query_plan
slow_query_log_file = /data/logs/mysql/mysql_slow.log
performance_schema = 0 skip-external-locking #跳过外部锁定,避免external locking bulk_insert_buffer_size = 64M
myisam_sort_buffer_size = 128M
myisam_max_sort_file_size = 10G
myisam_repair_threads = 1 default_storage_engine = InnoDB
innodb_file_per_table = 1
innodb_open_files = 500
innodb_buffer_pool_size = 2048M
innodb_write_io_threads = 8
innodb_read_io_threads = 8
innodb_thread_concurrency = 16
innodb_purge_threads = 1
innodb_flush_log_at_trx_commit = 1
innodb_log_buffer_size = 8M
innodb_log_file_size = 256M
innodb_log_files_in_group = 3
innodb_max_dirty_pages_pct = 90
innodb_lock_wait_timeout = 10
innodb_autoinc_lock_mode = 2
innodb_doublewrite = 1 innodb_rollback_on_timeout = 1
innodb_force_recovery=0
interactive_timeout = 28800
wait_timeout = 120 [galera]
wsrep_provider=/usr/local/mysql/lib/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://192.168.40.50,192.168.40.60,192.168.40.70"
wsrep_cluster_name='Galera_Cluster'
wsrep_node_address='192.168.40.60'
wsrep_node_name='MariaDB-Node2'
wsrep_slave_threads = 8 #开启并行复制线程,根据CPU核数设置
wsrep_log_conflicts=1
wsrep_on=ON #开启全同步复制模式
wsrep_provider_options="gcache.size=2G" #同步复制缓冲池
wsrep_forced_binlog_format=ROW
wsrep_drupal_282555_workaround=1
wsrep_max_ws_size=2147483647
wsrep_sst_method=rsync #拷贝模式
wsrep_sst_auth=galera:galera #galera集群认证用户:密码
query_cache_size=0 #关闭查询缓存
default_storage_engine=InnoDB #Mariadb存储引擎
#wsrep_dirty_reads=0 #当值是1的时候这个节点是只读节点
#wsrep_sst_method=[mysqldump|xtrabackup|xtrabackup-v2|rsync|rsync_wan]
#wsrep_sst_method=xtrabackup-v2 [mysqldump]
quick
max_allowed_packet = 16M [myisamchk]
key_buffer_size = 512M
sort_buffer_size = 8M
read_buffer = 4M
write_buffer = 4M [root@MariaDB-Node2 ~]#

192.168.40.70配置文件修改:

[root@MariaDB-Node3 ~]# vim /etc/my.cnf

[client]
port = 3306
socket = /tmp/mysql.sock [mysqld]
user = mysql
datadir = /data/mysql/data
pid-file = /data/mysql/mysql.pid
server-id = 70
relay_log =/data/logs/mysql/mysql_relay.log
bind-address = 0.0.0.0 tmpdir=/data/tmp init_connect ='SET NAMES utf8'
character-set-server = utf8
skip-name-resolve
back_log = 300 max_connections = 8019
max_connect_errors = 1024000
open_files_limit = 65535
table_open_cache = 2048
max_allowed_packet = 50M
max_heap_table_size = 512M
tmp_table_size = 256M read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M
key_buffer_size = 512M
thread_cache_size = 128 query_cache_type = 2
query_cache_size = 128M
query_cache_limit = 5M
thread_stack = 192k
ft_min_word_len = 4 log_bin = /data/mysql/mysql_bin.log
binlog_format = ROW
expire_logs_days = 7 log_error = /data/logs/mysql/mysql_error.log
slow_query_log = 1
long_query_time = 1
log_slow_verbosity=query_plan
slow_query_log_file = /data/logs/mysql/mysql_slow.log
performance_schema = 0 skip-external-locking #跳过外部锁定,避免external locking bulk_insert_buffer_size = 64M
myisam_sort_buffer_size = 128M
myisam_max_sort_file_size = 10G
myisam_repair_threads = 1 default_storage_engine = InnoDB
innodb_file_per_table = 1
innodb_open_files = 500
innodb_buffer_pool_size = 2048M
innodb_write_io_threads = 8
innodb_read_io_threads = 8
innodb_thread_concurrency = 16
innodb_purge_threads = 1
innodb_flush_log_at_trx_commit = 1
innodb_log_buffer_size = 8M
innodb_log_file_size = 256M
innodb_log_files_in_group = 3
innodb_max_dirty_pages_pct = 90
innodb_lock_wait_timeout = 10
innodb_autoinc_lock_mode = 2
innodb_doublewrite = 1 innodb_rollback_on_timeout = 1
innodb_force_recovery=0
interactive_timeout = 28800
wait_timeout = 120 [galera]
wsrep_provider=/usr/local/mysql/lib/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://192.168.40.50,192.168.40.60,192.168.40.70"
wsrep_cluster_name='Galera_Cluster'
wsrep_node_address='192.168.40.70'
wsrep_node_name='MariaDB-Node3'
wsrep_slave_threads = 8 #开启并行复制线程,根据CPU核数设置
wsrep_log_conflicts=1
wsrep_on=ON #开启全同步复制模式
wsrep_provider_options="gcache.size=2G" #同步复制缓冲池
wsrep_forced_binlog_format=ROW
wsrep_drupal_282555_workaround=1
wsrep_max_ws_size=2147483647
wsrep_sst_method=rsync #拷贝模式
wsrep_sst_auth=galera:galera #galera集群认证用户:密码
query_cache_size=0 #关闭查询缓存
default_storage_engine=InnoDB #Mariadb存储引擎
#wsrep_dirty_reads=0 #当值是1的时候这个节点是只读节点
#wsrep_sst_method=[mysqldump|xtrabackup|xtrabackup-v2|rsync|rsync_wan]
#wsrep_sst_method=xtrabackup-v2 [mysqldump]
quick
max_allowed_packet = 16M [myisamchk]
key_buffer_size = 512M
sort_buffer_size = 8M
read_buffer = 4M
write_buffer = 4M [root@MariaDB-Node3 ~]#

有几个参数需要修改的:

server-id = 50
wsrep_node_address='192.168.40.50'
wsrep_node_name='MariaDB-Node1'

修改对应的 server ID 、IP地址 和 主机名,其他的复制第一份到别的机器修改就可以

初始化集群

在MariaDB一个节点初始化安装  192.168.40.50 (其他节点不用操作)

[root@MariaDB-Node1 ~]# /usr/local/mysql/scripts/mysql_install_db --defaults-file=/etc/my.cnf --user=mysql

192.168.40.50 节点上通过bootstrap启动(第一次启动一定要使用 --wsrep-new-cluster,再次启动就不需要)

[root@MariaDB-Node1 ~]#  mysqld_safe --defaults-file=/etc/my.cnf --user=mysql  --wsrep-new-cluster &

其他节点操作

初始化节点(192.168.40.50)操作启动之后,别的主机只要正常启动就可以了

192.168.40.60操作:

[root@MariaDB-Node2 ~]# /etc/init.d/mysql start

192.168.40.70操作:

[root@MariaDB-Node3 ~]# /etc/init.d/mysql start

192.168.40.50操作:

初始化节点等待其他节点正常启动之后,再关闭带参数启动数据库,再已正常方式启动

[root@MariaDB-Node1 ~]# ps -ef|grep mysql
[root@MariaDB-Node1 ~]# kill 8316
[root@MariaDB-Node1 ~]# ps -ef|grep mysql
[root@MariaDB-Node1 ~]# /etc/init.d/mysql start

查看集群状态

[root@MariaDB-Node1 ~]# mysql -uroot -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 17
Server version: 10.3.27-MariaDB-log MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]>
MariaDB [(none)]>
MariaDB [(none)]> SHOW STATUS LIKE 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
1 row in set (0.001 sec) MariaDB [(none)]> show global status like 'ws%';
+-------------------------------+----------------------------------------------------------+
| Variable_name | Value |
+-------------------------------+----------------------------------------------------------+
| wsrep_applier_thread_count | 8 |
| wsrep_apply_oooe | 0.000000 |
| wsrep_apply_oool | 0.000000 |
| wsrep_apply_window | 0.000000 |
| wsrep_causal_reads | 0 |
| wsrep_cert_deps_distance | 0.000000 |
| wsrep_cert_index_size | 0 |
| wsrep_cert_interval | 0.000000 |
| wsrep_cluster_conf_id | 5 |
| wsrep_cluster_size | 3 |
| wsrep_cluster_state_uuid | a4d25ee0-515c-11eb-b451-66b7a1b050e2 |
| wsrep_cluster_status | Primary |
| wsrep_cluster_weight | 3 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000000 |
| wsrep_commit_window | 0.000000 |
| wsrep_connected | ON |
| wsrep_desync_count | 0 |
| wsrep_evs_delayed | |
| wsrep_evs_evict_list | |
| wsrep_evs_repl_latency | 0.000308077/0.00394484/0.0110307/0.00501106/3 |
| wsrep_evs_state | OPERATIONAL |
| wsrep_flow_control_active | false |
| wsrep_flow_control_paused | 0.000000 |
| wsrep_flow_control_paused_ns | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_flow_control_requested | false |
| wsrep_flow_control_sent | 0 |
| wsrep_gcomm_uuid | 4fd60274-523f-11eb-b7dd-476f9c9f0398 |
| wsrep_gmcast_segment | 0 |
| wsrep_incoming_addresses | 192.168.40.60:3306,192.168.40.70:3306,192.168.40.50:3306 |
| wsrep_last_committed | 0 |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_cached_downto | 18446744073709551615 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_commits | 0 |
| wsrep_local_index | 2 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_avg | 0.000000 |
| wsrep_local_recv_queue_max | 1 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_avg | 0.000000 |
| wsrep_local_send_queue_max | 1 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_local_state_uuid | a4d25ee0-515c-11eb-b451-66b7a1b050e2 |
| wsrep_open_connections | 0 |
| wsrep_open_transactions | 0 |
| wsrep_protocol_version | 9 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info@codership.com> |
| wsrep_provider_version | 25.3.31(r0ede97d) |
| wsrep_ready | ON |
| wsrep_received | 2 |
| wsrep_received_bytes | 306 |
| wsrep_repl_data_bytes | 0 |
| wsrep_repl_keys | 0 |
| wsrep_repl_keys_bytes | 0 |
| wsrep_repl_other_bytes | 0 |
| wsrep_replicated | 0 |
| wsrep_replicated_bytes | 0 |
| wsrep_rollbacker_thread_count | 1 |
| wsrep_thread_count | 9 |
+-------------------------------+----------------------------------------------------------+
66 rows in set (0.001 sec) MariaDB [(none)]>

可以看到集群正常使用有三个节点

注释

wsrep_cluster_status为Primary,表示节点为主节点,正常读写

wsrep_ready为ON,表示集群正常运行

wsrep_cluster_size为 3,表示集群有三个节点

遇到问题

如果数据库 全部关机 了,再次启动可以会遇到如下报错

[root@MariaDB-Node1 ~]# cd /data/logs/mysql/
[root@MariaDB-Node1 /data/logs/mysql]# tail -100f mysql_error.log 2021-01-09 13:49:33 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2021-01-09 13:49:33 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/local/mysql/lib/galera/libgalera_smm.so'
2021-01-09 13:49:33 0 [Note] WSREP: wsrep_load(): Galera 25.3.31(r0ede97d) by Codership Oy <info@codership.com> loaded successfully.
2021-01-09 13:49:33 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2021-01-09 13:49:33 0 [Note] WSREP: Found saved state: a4d25ee0-515c-11eb-b451-66b7a1b050e2:0, safe_to_bootstrap: 0
2021-01-09 13:49:33 0 [Note] WSREP: Passing config to GCS: base_dir = /data/mysql/data/; base_host = 192.168.40.50; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /data/mysql/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/mysql/data//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 2G; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout =
2021-01-09 13:49:33 0 [Note] WSREP: Assign initial position for certification: 0, protocol version: -1
2021-01-09 13:49:33 0 [Note] WSREP: wsrep_sst_grab()
2021-01-09 13:49:33 0 [Note] WSREP: Start replication
2021-01-09 13:49:33 0 [Note] WSREP: 'wsrep-new-cluster' option used, bootstrapping the cluster
2021-01-09 13:49:33 0 [Note] WSREP: Setting initial position to a4d25ee0-515c-11eb-b451-66b7a1b050e2:0
2021-01-09 13:49:33 0 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .
2021-01-09 13:49:33 0 [ERROR] WSREP: wsrep::connect(gcomm://192.168.40.50,192.168.40.60,192.168.40.70) failed: 7
2021-01-09 13:49:33 0 [ERROR] Aborting

解决办法

[root@MariaDB-Node1 /data/logs/mysql]# cd /data/mysql/data/
[root@MariaDB-Node1 /data/mysql/data]# cat grastate.dat
# GALERA saved state
version: 2.1
uuid: a4d25ee0-515c-11eb-b451-66b7a1b050e2
seqno: -1
safe_to_bootstrap: 0
[root@MariaDB-Node1 /data/mysql/data]# 把 safe_to_bootstrap: 0 修改为 1 [root@MariaDB-Node1 /data/mysql/data]# vim grastate.dat
# GALERA saved state
version: 2.1
uuid: a4d25ee0-515c-11eb-b451-66b7a1b050e2
seqno: -1
safe_to_bootstrap: 1
[root@MariaDB-Node1 /data/mysql/data]# mysqld_safe --defaults-file=/etc/my.cnf --user=mysql  --wsrep-new-cluster &

把 safe_to_bootstrap: 0 修改为 1,然后再用带参数 启动就可以,其他节点使用正常方法启动

mysql galera 集群常见问题处理

一、mysql HA集群在断网过久或者所有节点都down了之后的恢复有以下的方法:
解决方案1:
1、等三台机器恢复网络通讯后,因为此时的mysql已经异常无法加入集群,因此需要先保证所有的mysql都是down的,再上台执行 /usr/local/mysql/bin/mysqld --wsrep-new-cluster --wsrep-cluster-address='gcomm://' & 这条命令,并进入mysql(只有一台机器能够成功执行,其他机器执行了过几秒钟都会异常退出这个进程,我们这里把能够成功执行的机器称为master)
2、此时三台只有一台能够成功进入mysql(即执行mysql这条命令),在非master上的两台上一台一台的执行/etc/init.d/mysql start,必须等一台成功了,另一台才能执行

3、在mysql中执行show status like "wsrep%";

我们需要保证的第一项为synced,以及第二项必须为三个mysql的ip

4、保证3的结果是想要的说明集群已经恢复了,此时需要将master机器上面的 /usr/local/mysql/bin/mysqld --wsrep-new-cluster --wsrep-cluster-address='gcomm://'这个进程kill掉,然后再执行 /etc/init.d/mysql start 即可

二、mysql HA集群某个节点无故down了并且有一段时间处于down的情况通过以下方式恢复:

1、 若日志里面出现以下日志

[Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (eb9f50c6-bc95-11e5-a735-9f48e437dc03): 1 (Operation not permitted)

解决方法:删除/var/lib/mysql/grastate.dat 文件(若还存在无法同步的情况则删除galera.cache文件)

2、 若那个down了的节点出现以下日志

(异常情况集群挂了)[ERROR] Found 1 prepared transactions! It means that mysqld was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start mysqld with --tc-heuristic-recover switch to commit or rollback pending transactions

解决方法:

1、/usr/local/mysql/bin/mysqld start --innodb_force_recovery=6

  • (SRV_FORCE_IGNORE_CORRUPT):忽略检查到的corrupt页
  • (SRV_FORCE_NO_BACKGROUND):阻止主线程的运行,如主线程需要执行full purge操作,会导致crash
  • (SRV_FORCE_NO_TRX_UNDO):不执行事务回滚操作
  • (SRV_FORCE_NO_IBUF_MERGE):不执行插入缓冲的合并操作
  • (SRV_FORCE_NO_UNDO_LOG_SCAN):不查看重做日志,InnoDB存储引擎会将未提交的事务视为已提交
  • (SRV_FORCE_NO_LOG_REDO):不执行前滚的操作

如果配置后出现以下情况:
130507 14:14:01  InnoDB: Waiting for the background threads to start
130507 14:14:02  InnoDB: Waiting for the background threads to start
130507 14:14:03  InnoDB: Waiting for the background threads to start
130507 14:14:04  InnoDB: Waiting for the background threads to start
130507 14:14:05  InnoDB: Waiting for the background threads to start
130507 14:14:06  InnoDB: Waiting for the background threads to start
130507 14:14:07  InnoDB: Waiting for the background threads to start
130507 14:14:08  InnoDB: Waiting for the background threads to start
130507 14:14:09  InnoDB: Waiting for the background threads to start

需要在galera.cfg中添加这一下:
如果在设置 innodb_force_recovery >2 的同时innodb_purge_thread = 0
2、mysqld --tc-heuristic-recover=ROLLBACK
3、删除/var/lib/mysql/ib_logfile*
4、当某个mysql节点挂了,并且存在三个mysql所在host有不同的网段,当mysql想重新加入需要一个sst的过程,sst时会需要知道集群中某个节点的ip因此需要制定参数--wsrep-sst-receive-address否则可能出现同步的ip不在三台机器所共有的网段

三、一个mysql节点若down了一段时间。重新启动的时候需要一些时间去同步数据,服务的启动超时时间不够,导致服务无法启动,解决方法如下:
The correct way to adjust systemd settings so they don't get overwritten is to create a directory and file as such:
/etc/systemd/system/mariadb.service.d/timeout.conf
[Service]

TimeoutStartSec=12min

或者直接修改/usr/lib/systemd/system/mariadb.service
[Service]

TimeoutStartSec=12min
这里的时间最少要大于90s,默认是90s之后执行 systemctl daemon-reload再重启服务即可
四、日志中出现类似如下错误:
160428 13:54:49 [ERROR] Slave SQL: Error 'Table 'manage_operations' already exists' on query. Default database: 'horizon'. Query: 'CREATE TABLE `manage_operations` (
    `id` integer AUTO_INCREMENT NOT NULL PRIMARY KEY,
    `name` varchar(50) NOT NULL,
    `type` varchar(20) NOT NULL,
    `operation` varchar(20) NOT NULL,
    `status` varchar(20) NOT NULL,
    `time` date NOT NULL,
    `operator` varchar(50) NOT NULL
) default charset=utf8', Error_code: 1050
160428 13:54:49 [Warning] WSREP: RBR event 1 Query apply warning: 1, 28585
160428 13:54:49 [Warning] WSREP: Ignoring error for TO isolated action: source: 752eecd1-0ce0-11e6-83fc-3e0502d0bdd2 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 24053 trx_id: -1 seqnos (l: 28668, g: 28585, s: 28584, d: 28584, ts: 80224119986850)
导致进程异常关闭,
此时可以通过执行mysqladmin flush-tables来刷新表项,这个问题的原因是三个节点之间的表同步存在问题,刷新一下表即可

五、日志出现以下错误:
160520 10:48:23 [Note] WSREP: COMMIT failed, MDL released: 367194
160520 10:48:23 [Note] WSREP: cert failure, thd: 358780 is_AC: 0, retry: 0 - 1 SQL: commit
160520 10:48:23 [Note] WSREP: cert failure, thd: 358784 is_AC: 0, retry: 0 - 1 SQL: commit
160520 10:48:23 [Note] WSREP: COMMIT failed, MDL released: 367188
160520 10:48:23 [Note] WSREP: cert failure, thd: 359683 is_AC: 0, retry: 0 - 1 SQL: commit
160520 10:48:23 [Note] WSREP: cert failure, thd: 358808 is_AC: 0, retry: 0 - 1 SQL: commit
160520 10:48:23 [Note] WSREP: cert failure, thd: 367191 is_AC: 0, retry: 0 - 1 SQL: commit
160520 10:48:23 [Note] WSREP: cert failure, thd: 367196 is_AC: 0, retry: 0 - 1 SQL: commit
160520 10:48:23 [Note] WSREP: cert failure, thd: 367194 is_AC: 0, retry: 0 - 1 SQL: commit

160520 10:48:23 [Note] WSREP: cert failure, thd: 367188 is_AC: 0, retry: 0 - 1 SQL: commit

8、日志出现以下错误:

160820  3:13:41 [ERROR] Error in accept: Too many open files
160820  3:19:42 [ERROR] Error in accept: Too many open files
160827  3:16:24 [ERROR] Error in accept: Too many open files
160831 17:20:52 [ERROR] Error in accept: Too many open files
160831 19:54:29 [ERROR] Error in accept: Too many open files
160831 20:21:53 [ERROR] Error in accept: Too many open files
160901 11:25:57 [ERROR] Error in accept: Too many open files

解决方法

vim /usr/lib/systemd/system/mariadb.service

[Service]
 LimitNOFILE=10000

默认的mysql的open_file_limits是1024将该项增大,并且修改vim /etc/my.cnf.d/server.cnf该文件的open_files_limit值

systemctl daemon-reload

systemctl restart mysqld

查看mysql的open_file_limits值是否调整成功

cat /proc/$pid/limit

其中$pid为mysql进程的pid看看值是否调整成功,并看看日志是否还会出现上述错误

总结一下:

  1. 当所有节点都宕机的情况下,再次启动节点,Mariadb-Galera集群会出现无法启动的现象
  2. Mariadb-Galera集群启动是有顺序的,遵循一个原则:最后宕机的最先启动,因为集群认为这个节点的数据是最新的
  3. 在生产环境下应该避免使用大事务,不建议在高并发写入场景下使用Galera Cluster架构,会导致集群限流,从而引起整个集群hang住,出现生产故障。针对这种情况可以考虑主从,实现读写分离等手段
  4. 对数据一致性要求较高,并且数据写入不频繁,数据库容量也不大(50GB左右),网络状况良好的情况下,可以考虑使用Galera方案

本文分享完毕,感谢支持点赞~~

MariaDB Galera Cluster集群搭建的更多相关文章

  1. 【原】基于 HAproxy 1.6.3 Keeplived 在 Centos 7 中实现mysql mariadb galera cluster 集群分发读写 —— 上篇

    前言 有一段时间没有写blogs,乘着周末开始整理下haproxy + keeplived 实现 mysql mariadb galera cluster 集群访问环境的搭建工作. 本文集中讲hapr ...

  2. MariaDB Galera Cluster集群

    一.MariaDB Galera Cluster概要: 1.简述: MariaDB Galera Cluster 是一套在mysql innodb存储引擎上面实现multi-master及数据实时同步 ...

  3. Centos 7 下 Mysql 5.7 Galera Cluster 集群部署

     一.介绍 传统架构的使用,一直被人们所诟病,因为MySQL的主从模式,天生的不能完全保证数据一致,很多大公司会花很大人力物力去解决这个问题,而效果却一般,可以说,只能是通过牺牲性能,来获得数据一致性 ...

  4. 一次galera cluster集群故障节点无法启动问题排查

    现象 环境: Server version: 10.0.25-MariaDB-wsrep MariaDB Server, wsrep_25.13.raf7f02e 配置文件: [root@node-2 ...

  5. Redis Cluster集群搭建与应用

    1.redis-cluster设计 Redis集群搭建的方式有多种,例如使用zookeeper,但从redis 3.0之后版本支持redis-cluster集群,redis-cluster采用无中心结 ...

  6. 重要参考步骤---ProxySQL Cluster 集群搭建步骤

    环境 proxysql-1:192.168.20.202 proxysql-2:192.168.20.203 均采用yum方式安装 # cat <<EOF | tee /etc/yum.r ...

  7. Mysql Innodb cluster集群搭建

    之前搭建过一个Mysql Ndb cluster集群,但是mysql版本是5.7的,看到官网上mysql8的还是开发者版本,所以尝试搭建下mysql Innodb cluster集群. MySQL的高 ...

  8. redis 学习笔记(6)-cluster集群搭建

    上次写redis的学习笔记还是2014年,一转眼已经快2年过去了,在段时间里,redis最大的变化之一就是cluster功能的正式发布,以前要搞redis集群,得借助一致性hash来自己搞shardi ...

  9. mysql galera cluster 集群的分裂与仲裁机制

    集群的分裂 当集群由于网络原因分裂为几个单独的组时(一组可能是单节点,也可能是几个互联的节点),数据出现不一致,此时可能产生脑裂及数据不一致.这种情况 下,只有一组节点能够继续提供服务,这组节点的状态 ...

随机推荐

  1. 课堂练习之疫情APP

    title: 课堂练习之疫情查询APP date: 2020-03-17 20:08:51 tags: 在之前的体温记录APP上改进,只写出疫情信息查询页面的代码. 实体类与上篇博客SSM整合中的Ci ...

  2. 搭建xss-platform平台

    一直想搭在公网搭建自己的XSS平台用来验证XSS漏洞,使用别人的平台自己心里总会有担心被摘果子的顾虑,前几天参考了不少前人的博客,终于搭建好了,搭建的途中也遇到了不少坑,故把搭建的经验分享出来,大佬轻 ...

  3. 一文看懂 Kubernetes 服务发现: Service

    Service 简介   K8s 中提供微服务的实体是 Pod,Pod 在创建时 docker engine 会为 pod 分配 ip,"外部"流量通过访问该 ip 获取微服务.但 ...

  4. 密码学系列之:明文攻击和Bletchley Park

    目录 简介 crib和明文攻击 布莱奇利公园(Bletchley Park) 简介 明文攻击就是指已经知道了部分明文和它对应的加密后的字段,从而可以推测出使用的加密手段或者密码本.明文攻击这个故事还要 ...

  5. RDD、DF和DS的共性与区别

    共性: 1.都是spark平台下的分布式弹性数据集 2.都有惰性机制,创建.转换如map操作时不会立即执行,遇到foreach等Action算子时才开始运算. 3.都会自动缓存计算 4.都有parti ...

  6. Jwt令牌创建

    添加依赖 <dependencies> <!-- jwt --> <dependency> <groupId>io.jsonwebtoken</g ...

  7. 看我如何用微信上线CobaltStrike

    前言 DLL劫持漏洞是老生常谈的一个漏洞,已经被前辈们各种奇技淫巧玩烂.但DLL劫持技术在后渗透和域渗透中的权限提升和权限维持都起到了至关重要的作用.本文简单剖析DLL劫持技术并通过实例应用来查看如何 ...

  8. 使用HBase Shell命令

    使用HBase Shell命令 或 使用HBase Java API完成: 列出HBase所有的表的相关信息,例如表名: 在终端打印出指定的表的所有记录数据: 向已经创建好的表添加和删除指定的列族或列 ...

  9. uniapp云打包配置讲解

    HBuilderX开发工具,菜单栏:发行(U) → 原生App云打包(P) 安卓云打包配置: 云打包配置分为公共测试证书和自有证书. 云打包配置使用公共测试证书很简单,直接勾选后打包. 如果要测试第三 ...

  10. 20201207-2 openpyxl 库与模块导入

    1-1 import openpyxl # 通过文件路径,打开工作簿 wb1 = openpyxl.load_workbook('./demo_excel.xlsx') # 用 Workbook() ...