GlusterFS分布式存储系统中更换故障Brick的操作记录1

前面已经介绍了GlusterFS分布式存储集群环境部署记录，现在模拟下更换故障Brick的操作：

1）GlusterFS集群系统一共有4个节点，集群信息如下：

分别在各个节点上配置hosts、同步好系统时间，关闭防火墙和selinux

[root@GlusterFS-slave data]# cat /etc/hosts

192.168.10.239  GlusterFS-master

192.168.10.212  GlusterFS-slave

192.168.10.204  GlusterFS-slave2

192.168.10.220  GlusterFS-slave3

分别在各个节点上创建存储目录

首先新建分区

# fdisk /dev/sdb        //依次输入p->n->1->回车->回车->w

发现并校验

# partx /dev/sdb

# ls /dev/sdb*

创建文件系统

# mkfs.xfs -i size=1024 /dev/sdb1

配置挂载

# mkdir -p /data

# echo '/dev/sdb1 /data xfs defaults 1 2' >> /etc/fstab

# mount -a && mount

配置存储位置

# mkdir /data/gluster

部署glusterfs集群的中间部分操作在此省略，具体可参考：http://www.cnblogs.com/kevingrace/p/8743812.html

[root@GlusterFS-master ~]# gluster peer status

Number of Peers: 3

Hostname: 192.168.10.212

Uuid: f8e69297-4690-488e-b765-c1c404810d6a

State: Peer in Cluster (Connected)

Hostname: 192.168.10.204

Uuid: a989394c-f64a-40c3-8bc5-820f623952c4

State: Peer in Cluster (Connected)

Hostname: 192.168.10.220

Uuid: dd99743a-285b-4aed-b3d6-e860f9efd965

State: Peer in Cluster (Connected)

[root@GlusterFS-master ~]# gluster volume info

Volume Name: models

Type: Distributed-Replicate

Volume ID: f1945b0b-67d6-4202-9198-639244ab0a6a

Status: Started

Number of Bricks: 2 x 2 = 4

Transport-type: tcp

Bricks:

Brick1: 192.168.10.239:/data/gluster

Brick2: 192.168.10.212:/data/gluster

Brick3: 192.168.10.204:/data/gluster

Brick4: 192.168.10.220:/data/gluster

Options Reconfigured:

auth.allow: 192.168.*

performance.write-behind: on

performance.io-thread-count: 32

performance.flush-behind: on

performance.cache-size: 128MB

features.quota: on

客户端挂载GlusterFS存储

[root@Client ~]# mount -t glusterfs 192.168.10.239:models /data/gluster/

2）测试Gluster卷

写入测试数据

[root@Client ~]# for i in `seq -w 1 100`; do cp -rp /var/log/messages /opt/gfsmount/copy-test-$i; done

写入确认

[root@Client ~]# ls -lA /opt/gfsmount|wc -l

101

在各节点机器上也确认下，发现这100个文件随机地各自分为了两个50份的文件（均衡），分别同步到了第1-2节点和第3-4节点上了。

[root@GlusterFS-master ~]# ll /opt/gluster/data/|wc -l

51

[root@GlusterFS-master ~]# ls /opt/gluster/data/

copy-test-001  copy-test-016  copy-test-028  copy-test-038  copy-test-054  copy-test-078  copy-test-088  copy-test-100

copy-test-004  copy-test-017  copy-test-029  copy-test-039  copy-test-057  copy-test-079  copy-test-090

copy-test-006  copy-test-019  copy-test-030  copy-test-041  copy-test-060  copy-test-081  copy-test-093

copy-test-008  copy-test-021  copy-test-031  copy-test-046  copy-test-063  copy-test-082  copy-test-094

copy-test-011  copy-test-022  copy-test-032  copy-test-048  copy-test-065  copy-test-083  copy-test-095

copy-test-012  copy-test-023  copy-test-033  copy-test-051  copy-test-073  copy-test-086  copy-test-098

copy-test-015  copy-test-024  copy-test-034  copy-test-052  copy-test-077  copy-test-087  copy-test-099

[root@GlusterFS-slave ~]# ll /opt/gluster/data/|wc -l

51

[root@GlusterFS-slave ~]# ls /opt/gluster/data/

copy-test-001  copy-test-016  copy-test-028  copy-test-038  copy-test-054  copy-test-078  copy-test-088  copy-test-100

copy-test-004  copy-test-017  copy-test-029  copy-test-039  copy-test-057  copy-test-079  copy-test-090

copy-test-006  copy-test-019  copy-test-030  copy-test-041  copy-test-060  copy-test-081  copy-test-093

copy-test-008  copy-test-021  copy-test-031  copy-test-046  copy-test-063  copy-test-082  copy-test-094

copy-test-011  copy-test-022  copy-test-032  copy-test-048  copy-test-065  copy-test-083  copy-test-095

copy-test-012  copy-test-023  copy-test-033  copy-test-051  copy-test-073  copy-test-086  copy-test-098

copy-test-015  copy-test-024  copy-test-034  copy-test-052  copy-test-077  copy-test-087  copy-test-099

[root@GlusterFS-slave2 ~]# ll /opt/gluster/data/|wc -l

51

[root@GlusterFS-slave2 ~]#  ls /opt/gluster/data/

copy-test-002  copy-test-014  copy-test-036  copy-test-047  copy-test-059  copy-test-069  copy-test-080  copy-test-097

copy-test-003  copy-test-018  copy-test-037  copy-test-049  copy-test-061  copy-test-070  copy-test-084

copy-test-005  copy-test-020  copy-test-040  copy-test-050  copy-test-062  copy-test-071  copy-test-085

copy-test-007  copy-test-025  copy-test-042  copy-test-053  copy-test-064  copy-test-072  copy-test-089

copy-test-009  copy-test-026  copy-test-043  copy-test-055  copy-test-066  copy-test-074  copy-test-091

copy-test-010  copy-test-027  copy-test-044  copy-test-056  copy-test-067  copy-test-075  copy-test-092

copy-test-013  copy-test-035  copy-test-045  copy-test-058  copy-test-068  copy-test-076  copy-test-096

[root@GlusterFS-slave3 ~]# ll /opt/gluster/data/|wc -l

51

[root@GlusterFS-slave3 ~]# ls /opt/gluster/data/

copy-test-002  copy-test-014  copy-test-036  copy-test-047  copy-test-059  copy-test-069  copy-test-080  copy-test-097

copy-test-003  copy-test-018  copy-test-037  copy-test-049  copy-test-061  copy-test-070  copy-test-084

copy-test-005  copy-test-020  copy-test-040  copy-test-050  copy-test-062  copy-test-071  copy-test-085

copy-test-007  copy-test-025  copy-test-042  copy-test-053  copy-test-064  copy-test-072  copy-test-089

copy-test-009  copy-test-026  copy-test-043  copy-test-055  copy-test-066  copy-test-074  copy-test-091

copy-test-010  copy-test-027  copy-test-044  copy-test-056  copy-test-067  copy-test-075  copy-test-092

copy-test-013  copy-test-035  copy-test-045  copy-test-058  copy-test-068  copy-test-076  copy-test-096

3）模拟brick故障

3.1）查看当前存储状态

在GlusterFS-slave3节点机器上操作

[root@GlusterFS-slave3 ~]# gluster volume status

Status of volume: models

Gluster process                          Port  Online  Pid

------------------------------------------------------------------------------

Brick 192.168.10.239:/data/gluster     49152 Y 6016

Brick 192.168.10.212:/data/gluster     49152 Y 2910

Brick 192.168.10.204:/data/gluster     49153 Y 9030

Brick 192.168.10.220:/data/gluster     49153 Y 12363

NFS Server on localhost         N/A N N/A

Self-heal Daemon on localhost       N/A Y 12382

Quota Daemon on localhost       N/A Y 12389

NFS Server on 192.168.10.204        N/A N N/A

Self-heal Daemon on 192.168.10.204      N/A Y 9049

Quota Daemon on 192.168.10.204        N/A Y 9056

NFS Server on GlusterFS-master        N/A N N/A

Self-heal Daemon on GlusterFS-master      N/A Y 6037

Quota Daemon on GlusterFS-master      N/A Y 6042

NFS Server on 192.168.10.212        N/A N N/A

Self-heal Daemon on 192.168.10.212      N/A Y 2930

Quota Daemon on 192.168.10.212        N/A Y 2936

Task Status of Volume models

------------------------------------------------------------------------------

Task                 : Rebalance

ID                   : f7bc799f-d8a8-488e-9c38-dc1f2c685a99

Status               : completed

注：注意到Online项全部为"Y"

2）制造故障（注意这里模拟的是文件系统故障，假设物理硬盘没有问题或已经更换阵列中的硬盘）

在GlusterFS-slave3节点机器上操作

[root@GlusterFS-slave3 ~]# vim /etc/fstab     //注释掉如下行

......

#/dev/sdb1 /data xfs defaults 1 2

重启服务器

[root@GlusterFS-slave3 ~]# reboot

3）查看当前存储状态

[root@GlusterFS-slave3 ~]# gluster volume status

Status of volume: models

Gluster process                           Port  Online  Pid

------------------------------------------------------------------------------

Brick 192.168.10.239:/data/gluster     49152 Y 6016

Brick 192.168.10.212:/data/gluster     49152 Y 2910

Brick 192.168.10.204:/data/gluster     49153 Y 9030

Brick 192.168.10.220:/data/gluster     N/A N N/A

NFS Server on localhost         N/A N N/A

Self-heal Daemon on localhost       N/A Y 12382

Quota Daemon on localhost       N/A Y 12389

NFS Server on GlusterFS-master        N/A N N/A

Self-heal Daemon on GlusterFS-master      N/A Y 6037

Quota Daemon on GlusterFS-master      N/A Y 6042

NFS Server on 192.168.10.204        N/A N N/A

Self-heal Daemon on 192.168.10.204      N/A Y 9049

Quota Daemon on 192.168.10.204        N/A Y 9056

NFS Server on 192.168.10.212        N/A N N/A

Self-heal Daemon on 192.168.10.212      N/A Y 2930

Quota Daemon on 192.168.10.212        N/A Y 2936

Task Status of Volume models

------------------------------------------------------------------------------

Task                 : Rebalance

ID                   : f7bc799f-d8a8-488e-9c38-dc1f2c685a99

Status               : completed

注意：发现GlusterFS-slave3节点（192.168.10.220）的Online项状态为"N"了！

4）恢复故障brick方法

4.1）结束故障brick的进程

如上通过"gluster volume status"命令，如果查看到状态Online项为"N"的GlusterFS-slave3节点存在PID号（不显示N/A）,则应当使用"kill -15 pid"杀死它！

一般当Online项为"N"时就不显示pid号了。

4.2）创建新的数据目录（注意绝不可以与之前目录一样）

[root@GlusterFS-slave3 ~]# mkfs.xfs -i size=1024 /dev/sdb1

[root@GlusterFS-slave3 ~]# vim /etc/fstab           //去掉下面注释

......

/dev/sdb1 /data xfs defaults 1 2

重新挂载文件系统：

[root@GlusterFS-slave3 ~]# mount -a

增加新的数据存放文件夹（不可以与之前目录一样）

[root@GlusterFS-slave3 ~]# mkdir -p /data/gluster1

4.3）查询故障节点的备份节点(GlusterFS-slave2)目录的扩展属性（使用"yum search getfattr"命令getfattr工具的安装途径）

[root@GlusterFS-slave2 ~]# yum install -y attr.x86_64

[root@GlusterFS-slave2 ~]# getfattr -d -m. -e hex /data/gluster

getfattr: Removing leading '/' from absolute path names

# file: opt/gluster/data

trusted.gfid=0x00000000000000000000000000000001

trusted.glusterfs.dht=0x00000001000000007fff7f58ffffffff

trusted.glusterfs.quota.dirty=0x3000

trusted.glusterfs.quota.size=0x0000000003c19000

trusted.glusterfs.volume-id=0xf1945b0b67d642029198639244ab0a6a

4.4）挂载卷并触发自愈

在客户端先卸载掉之前的挂载

[root@Client ~]# umount /data/gluster

然后重新挂载GlusterFS-slave3（其实挂载哪一个节点的都可以）

[root@Client ~]# mount -t glusterfs 192.168.10.220:models /data/gluster/

新建一个卷中不存在的目录并删除

[root@Client gfsmount]# mkdir testDir001

[root@Client gfsmount]# rm -rf testDir001

设置扩展属性触发自愈

[root@Client gfsmount]# setfattr -n trusted.non-existent-key -v abc /data/gluster

[root@Client gfsmount]# setfattr -x trusted.non-existent-key /data/gluster

4.5）检查当前节点是否挂起xattrs

再次查询故障节点的备份节点(GlusterFS-slave2)目录的扩展属性

[root@GlusterFS-slave2 ~]# getfattr -d -m. -e hex /data/gluster

getfattr: Removing leading '/' from absolute path names

# file: opt/gluster/data

trusted.afr.dirty=0x000000000000000000000000

trusted.afr.models-client-2=0x000000000000000000000000

trusted.afr.models-client-3=0x000000000000000200000002

trusted.gfid=0x00000000000000000000000000000001

trusted.glusterfs.dht=0x00000001000000007fff7f58ffffffff

trusted.glusterfs.quota.dirty=0x3000

trusted.glusterfs.quota.size=0x0000000003c19000

trusted.glusterfs.volume-id=0xf1945b0b67d642029198639244ab0a6a

注意：留意第5行,表示xattrs已经将源标记为GlusterFS-slave3:/data/gluster

4.6）检查卷的状态是否显示需要替换

[root@GlusterFS-slave3 data]# gluster volume heal models info

Brick GlusterFS-master:/data/gluster/

Number of entries: 0

Brick GlusterFS-slave:/data/gluster/

Number of entries: 0

Brick GlusterFS-slave2:/data/gluster/

/

Number of entries: 1

Brick 192.168.10.220:/data/gluster

Status: Transport endpoint is not connected

注：状态提示传输端点未连接（最后一行）

4.7）使用强制提交完成操作

[root@GlusterFS-slave3 data]# gluster volume replace-brick models 192.168.10.220:/data/gluster 192.168.10.220:/data/gluster1 commit force

提示如下表示正常完成：

volume replace-brick: success: replace-brick commit force operation successful

注意：也可以将数据恢复到另外一台服务器，详细命令如下（192.168.10.230为新增的另一个glusterfs节点）（可选）：

# gluster peer probe 192.168.10.230

# gluster volume replace-brick models 192.168.10.220:/data/gluster 192.168.10.230:/data/gluster commit force

4.8）检查存储的在线状态

[root@GlusterFS-slave3 ~]# gluster volume status

Status of volume: models

Gluster process                          Port  Online  Pid

------------------------------------------------------------------------------

Brick 192.168.10.239:/data/gluster1    49152 Y 6016

Brick 192.168.10.212:/data/gluster     49152 Y 2910

Brick 192.168.10.204:/data/gluster     49153 Y 9030

Brick 192.168.10.220:/data/gluster     49153 Y 12363

NFS Server on localhost         N/A N N/A

Self-heal Daemon on localhost       N/A Y 12382

Quota Daemon on localhost       N/A Y 12389

NFS Server on 192.168.10.204        N/A N N/A

Self-heal Daemon on 192.168.10.204      N/A Y 9049

Quota Daemon on 192.168.10.204        N/A Y 9056

NFS Server on GlusterFS-master        N/A N N/A

Self-heal Daemon on GlusterFS-master      N/A Y 6037

Quota Daemon on GlusterFS-master      N/A Y 6042

NFS Server on 192.168.10.212        N/A N N/A

Self-heal Daemon on 192.168.10.212      N/A Y 2930

Quota Daemon on 192.168.10.212        N/A Y 2936

Task Status of Volume models

------------------------------------------------------------------------------

Task                 : Rebalance

ID                   : f7bc799f-d8a8-488e-9c38-dc1f2c685a99

Status               : completed

另外，如果更换到其他服务器状态显示如下：

[root@GlusterFS-slave ~]# gluster volume status

Status of volume: models

Gluster process                          Port  Online  Pid

------------------------------------------------------------------------------

Brick 192.168.10.239:/data/gluster    49152 Y 6016

Brick 192.168.10.212:/data/gluster     49152 Y 2910

Brick 192.168.10.204:/data/gluster     49153 Y 9030

Brick 192.168.10.220:/data/gluster     49153 Y 12363

NFS Server on localhost         N/A N N/A

Self-heal Daemon on localhost       N/A Y 12382

Quota Daemon on localhost       N/A Y 12389

NFS Server on 192.168.10.204        N/A N N/A

Self-heal Daemon on 192.168.10.204      N/A Y 9049

Quota Daemon on 192.168.10.204        N/A Y 9056

NFS Server on GlusterFS-master        N/A N N/A

Self-heal Daemon on GlusterFS-master      N/A Y 6037

Quota Daemon on GlusterFS-master      N/A Y 6042

NFS Server on 192.168.10.212        N/A N N/A

Self-heal Daemon on 192.168.10.212      N/A Y 2930

Quota Daemon on 192.168.10.212        N/A Y 2936

Task Status of Volume models

------------------------------------------------------------------------------

Task                 : Rebalance

ID                   : f7bc799f-d8a8-488e-9c38-dc1f2c685a99

Status               : completed

======================================================================

注意：上面是新建的独立分区，在这个独立分区上创建存储目录。如果不新建独立分区，直接在/分区上创建存储目录，

如文档http://www.cnblogs.com/kevingrace/p/8743812.html中的四个节点的存储目录是/opt/gluster/data。

如果是GlusterFS-slave3节点的这个存储目录/opt/gluster/data不小心误删除了。

最简单直接的方法可以是：

1）如上面操作，将删除的/opt/gluster/data目录重新mkdir新建出来

2）停止复制卷磁盘：gluster volume stop models

3）删除复制卷磁盘：gluster volume delete models

4）重新创建复制卷（副本卷）。卷名还是之前的models。这里选择4份副本。

# gluster volume create models replica 4 192.168.10.239:/opt/gluster/data 192.168.10.212:/opt/gluster/data 192.168.10.204:/opt/gluster/data 192.168.10.220:/opt/gluster/dataforce

5）删除复制卷磁盘后：gluster volume info   可以查看到四个节点的Bricks信息

6）客户端重新挂载glusterfs存储即可。

这样，发生故障的GlusterFS-slave3节点的存储目录下的数据就会跟另外一个replica组GlusterFS-master、GlusterFS-slave的数据一致。

由于GlusterFS-slave2是GlusterFS-slave3的备份节点，所以GlusterFS-slave2的存储目录下数据会涵盖所有节点的数据之和！

或者在上面第4步中重新创建副本卷的时候，还是和之前一样创建2个副本

# gluster volume create models replica 2 192.168.10.239:/opt/gluster/data 192.168.10.212:/opt/gluster/data force

然后将另外两个节点添加到复制卷里面

# gluster volume stop models

# gluster volume status models

# gluster volume add-brick models 192.168.10.204:/opt/gluster/data 192.168.10.220:/opt/gluster/data force

GlusterFS分布式存储系统中更换故障Brick的操作记录1的更多相关文章

GlusterFS分布式存储系统中更换故障Brick的操作记录
前面已经介绍了GlusterFS分布式存储集群环境部署记录,现在模拟下更换故障Brick的操作: 1)GlusterFS集群系统一共有4个节点,集群信息如下: 分别在各个节点上配置hosts.同步好系 ...
关于分布式存储系统中-CAP原则(CAP定理)与BASE理论比较
CAP原则又称CAP定理,指的是在一个分布式系统中, Consistency(一致性). Availability(可用性).Partition tolerance(分区容错性),三者不可得兼. CA ...
CentOS 7.6 部署 GlusterFS 分布式存储系统
文章目录 GlusterFS简介环境介绍开始GlusterFS部署配置hosts解析配置GlusterFS 创建文件系统安装GlusterFS 启动GlusterFS 将节点加入到主机池创 ...
Linux实战教学笔记52：GlusterFS分布式存储系统
一,分布式文件系统理论基础 1.1 分布式文件系统出现计算机通过文件系统管理,存储数据,而现在数据信息爆炸的时代中人们可以获取的数据成指数倍的增长,单纯通过增加硬盘个数来扩展计算机文件系统的存储容量 ...
GlusterFS分布式存储系统
一,分布式文件系统理论基础 1.1 分布式文件系统出现计算机通过文件系统管理,存储数据,而现在数据信息爆炸的时代中人们可以获取的数据成指数倍的增长,单纯通过增加硬盘个数来扩展计算机文件系统的存储容量 ...
jenkins中通过git发版操作记录
之前说到的jenkins自动化构建发版是通过svn方式,今天这里介绍下通过git方式发本的操作记录. 一.不管是通过svn发版还是git发版,都要首先下载svn或git插件.登陆jenkins,依次点 ...
GlusterFS分布式存储系统复制集更换故障Brick操作记录
场景: GlusterFS 3节点的复制集,由于磁盘故障,其中一个复制集需要重装系统,所以需要重装glusterfs并将该节点加入glusterfs集群一. 安装GlusterFS 首先在重装系统节 ...
分布式存储系统之Ceph集群存储池操作
前文我们了解了ceph的存储池.PG.CRUSH.客户端IO的简要工作过程.Ceph客户端计算PG_ID的步骤的相关话题,回顾请参考https://www.cnblogs.com/qiuhom-187 ...
在分布式数据库中CAP原理CAP+BASE
本篇博文的内容均来源于网络,本人只是整理,仅供学习! 一.关系型数据库关系型数据库遵循ACID规则事务在英文中是transaction,和现实世界中的交易很类似,它有如下四个特性: 1.A (At ...

随机推荐

R语言编程艺术#03#列表（list）
向量的元素要求都是同类型的,而列表(list)与向量不同,可以组合多个不同类型的对象.类似于C语言中的结构体(struct)类型. 1.创建列表从技术上讲,列表就是向理.之前我们接触过的普通向量都称 ...
【转】python实战——教你用微信每天给女朋友说晚安
但凡一件事,稍微有些重复.我就考虑怎么样用程序来实现它. 这里给各位程序员朋友分享如何每天给朋友定时微信发送”晚安“,故事,新闻,等等··· ··· 最好运行在服务器上,这样后台挂起来更方便. #!/ ...
【iCore4 双核心板_FPGA】例程十一：FSMC总线通信实验——独立地址模式
实验原理: STM32F767上自带FMC控制器,本实验将通过FMC总线的地址独立模式实现STM32与FPGA 之间通信,FPGA内部建立RAM块,FPGA桥接STM32和RAM块,本实验通过FSMC ...
【原】关于AdaBoost的一些再思考
一.Decision Stumps: Decision Stumps称为单层分类器,主要用作Ensemble Method的组件(弱分类器).一般只进行一次判定,可以包含两个或者多个叶结点.对于离散数 ...
c++ 一个h文件里面定义一个主类，然后定义多个子类
最近遇到一个函数,在调用的时候出现问题,记录下实现过程. #ifndef MLS_DEFORMATION_H #define MLS_DEFORMATION_H #include <vector ...
CentOS5.x、CentOS6.x 使用NFS及mount实现两台服务器间目录共享
一.环境介绍: 服务器:centos 192.168.1.225 客户端:centos 192.168.1.226 二.安装: NFS的安装配置:centos 5 : portmap:实现RPC(协议 ...
3D点云的深度学习
使用卷积神经网络(CNN)架构的深度学习(DL)现在是解决图像分类任务的标准解决方法.但是将此用于处理3D数据时,问题变得更加复杂.首先,可以使用各种结构来表示3D数据,所述结构包括: 1 体素网格 ...
linux中的信号机制
概述 Linux信号机制是在应用软件层次上对中断机制的一种模拟,信号提供了一种处理异步事件的方法,例如,终端用户输入中断键(ctrl+c),则会通过信号机制停止一个程序[1]. 这其实就是向那个程序( ...
shell-整理目录下的备份文件并生成压缩包
背景: CI构建下来的备份应用包在服务器上保留几十个,空间占用大,看着不好看,可能还用不着,所以准备正好练练手吧! 其实CI上可以设置少保留几个,但是我没管.我只是想练练脚本先来看一下我的服务器源目 ...
集群介绍 keepalived介绍用keepalived配置高可用集群
集群介绍 • 根据功能划分为两大类:高可用和负载均衡 • 高可用集群通常为两台服务器,一台工作,另外一台作为冗余,当提供服务的机器宕机,冗余将接替继续提供服务 • 实现高可用的开源软件有:heartb ...

GlusterFS分布式存储系统中更换故障Brick的操作记录1

GlusterFS分布式存储系统中更换故障Brick的操作记录1的更多相关文章

随机推荐

热门专题