Greenplum failed segment的恢复方法
【前记】
Segment检测及故障切换机制
GP Master首先会检测Primary状态,如果Primary不可连通,那么将会检测Mirror状态,Primary/Mirror状态总共有4种:
1. Primary活着,Mirror活着。GP Master探测Primary成功之后直接返回,进行下一个Segment检测;
2. Primary活着,Mirror挂了。GP Master探测Primary成功之后,通过Primary返回的状态得知Mirror挂掉了(Mirror挂掉之后,Primary将会探测到,将自己变成ChangeTracking模式),这时候更新Master元信息,进行下一个Segment检测;
3. Primary挂了,Mirror活着。GP Master探测Primary失败之后探测Mirror,发现Mirror是活着,这时候更新Master上面的元信息,同时使Mirror接管Primary(故障切换),进行下一个Segment检测;
4. Primary挂了,Mirror挂了。GP Master探测Primary失败之后探测Mirror,Mirror也是挂了,直到重试最大值,结束这个Segment的探测,也不更新Master元信息了,进行下一个Segment检测。
上面的2-4需要进行gprecoverseg进行segment恢复。
对失败的segment节点;启动时会直接跳过,忽略。
[gpadmin@mdw ~]$ gpstart :::: gpstart:mdw:gpadmin-[INFO]:-Starting gpstart with args: :::: gpstart:mdw:gpadmin-[INFO]:-Gathering information and validating the environment... :::: gpstart:mdw:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.8.1 build 1' :::: gpstart:mdw:gpadmin-[INFO]:-Greenplum Catalog Version: ' :::: gpstart:mdw:gpadmin-[INFO]:-Starting Master instance in admin mode :::: gpstart:mdw:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information :::: gpstart:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... :::: gpstart:mdw:gpadmin-[INFO]:-Setting new master era :::: gpstart:mdw:gpadmin-[INFO]:-Master Started... :::: gpstart:mdw:gpadmin-[INFO]:-Shutting down master :::: gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw2 directory /home/gpadmin/gpdata/gpdatam/gpseg0 <<<<< :::: gpstart:mdw:gpadmin-[INFO]:--------------------------- :::: gpstart:mdw:gpadmin-[INFO]:-Master instance parameters :::: gpstart:mdw:gpadmin-[INFO]:--------------------------- :::: gpstart:mdw:gpadmin-[INFO]:-Database = template1 :::: gpstart:mdw:gpadmin-[INFO]:-Master Port = :::: gpstart:mdw:gpadmin-[INFO]:-Master directory = /home/gpadmin/gpdata/pgmaster/gpseg- :::: gpstart:mdw:gpadmin-[INFO]:-Timeout = seconds :::: gpstart:mdw:gpadmin-[INFO]:-Master standby = Off :::: gpstart:mdw:gpadmin-[INFO]:--------------------------------------- :::: gpstart:mdw:gpadmin-[INFO]:-Segment instances that will be started :::: gpstart:mdw:gpadmin-[INFO]:--------------------------------------- :::: gpstart:mdw:gpadmin-[INFO]:- Host Datadir Port Role :::: gpstart:mdw:gpadmin-[INFO]:- sdw1 /home/gpadmin/gpdata/gpdatap/gpseg0 Primary :::: gpstart:mdw:gpadmin-[INFO]:- sdw2 /home/gpadmin/gpdata/gpdatap/gpseg1 Primary :::: gpstart:mdw:gpadmin-[INFO]:- sdw1 /home/gpadmin/gpdata/gpdatam/gpseg1 Mirror Continue with Greenplum instance startup Yy|Nn (default=N): > y :::: gpstart:mdw:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait... ........... :::: gpstart:mdw:gpadmin-[INFO]:-Process results... :::: gpstart:mdw:gpadmin-[INFO]:----------------------------------------------------- :::: gpstart:mdw:gpadmin-[INFO]:- Successful segment starts = :::: gpstart:mdw:gpadmin-[INFO]:- Failed segment starts = :::: gpstart:mdw:gpadmin-[WARNING]:-Skipped segment starts (segments are marked down in configuration) = <<<<<<<< :::: gpstart:mdw:gpadmin-[INFO]:----------------------------------------------------- :::: gpstart:mdw:gpadmin-[INFO]:- :::: gpstart:mdw:gpadmin-[INFO]:-Successfully started of segment instances, skipped other segments :::: gpstart:mdw:gpadmin-[INFO]:----------------------------------------------------- :::: gpstart:mdw:gpadmin-[WARNING]:-**************************************************************************** :::: gpstart:mdw:gpadmin-[WARNING]:-There are segment(s) marked down in the database :::: gpstart:mdw:gpadmin-[WARNING]:-To recover from this current state, review usage of the gprecoverseg :::: gpstart:mdw:gpadmin-[WARNING]:-management utility which will recover failed segment instance databases. :::: gpstart:mdw:gpadmin-[WARNING]:-**************************************************************************** :::: gpstart:mdw:gpadmin-[INFO]:-Starting Master :::: gpstart:mdw:gpadmin-[INFO]:-Command pg_ctl reports Master mdw instance active :::: gpstart:mdw:gpadmin-[INFO]:-No standby master configured. skipping... :::: gpstart:mdw:gpadmin-[WARNING]:-Number of segments :::: gpstart:mdw:gpadmin-[INFO]:-Check status of database with gpstate utility
查看数据库的mirror的节点启动状态
[gpadmin@mdw ~]$ gpstate -m :::: gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -m :::: gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.8.1 build 1' :::: gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.8.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Apr 20 2016 08:08:56' :::: gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... :::: gpstate:mdw:gpadmin-[INFO]:-------------------------------------------------------------- :::: gpstate:mdw:gpadmin-[INFO]:--Current GPDB mirror list and status :::: gpstate:mdw:gpadmin-[INFO]:--Type = Spread :::: gpstate:mdw:gpadmin-[INFO]:-------------------------------------------------------------- :::: gpstate:mdw:gpadmin-[INFO]:- Mirror Datadir Port Status Data Status :::: gpstate:mdw:gpadmin-[WARNING]:-sdw2 /home/gpadmin/gpdata/gpdatam/gpseg0 Failed <<<<<<<< :::: gpstate:mdw:gpadmin-[INFO]:- sdw1 /home/gpadmin/gpdata/gpdatam/gpseg1 Passive Synchronized :::: gpstate:mdw:gpadmin-[INFO]:-------------------------------------------------------------- :::: gpstate:mdw:gpadmin-[WARNING]:- segment(s) configured as mirror(s) have failed
可直观看出“[WARNING]:-sdw2 /home/gpadmin/gpdata/gpdatam/gpseg0 50000 Failed ”
如何恢复这个mirror segment呢?当然primary segment也是这样恢复的
1. 首先产生一个恢复的配置文件 : gprecoverseg -o ./recov
[gpadmin@mdw ~]$ gprecoverseg -o ./recov :::: gprecoverseg:mdw:gpadmin-[INFO]:-Starting gprecoverseg with args: -o ./recov :::: gprecoverseg:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.8.1 build 1' :::: gprecoverseg:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.8.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Apr 20 2016 08:08:56' :::: gprecoverseg:mdw:gpadmin-[INFO]:-Checking if segments are ready :::: gprecoverseg:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... :::: gprecoverseg:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... :::: gprecoverseg:mdw:gpadmin-[INFO]:-Configuration file output to ./recov successfully.
2. 查看恢复的配置文件;可以知道哪些segment需要恢复
[gpadmin@mdw ~]$ cat recov filespaceOrder=fastdisk sdw2::/home/gpadmin/gpdata/gpdatam/gpseg0
3. 使用这个配置文件进行恢复 : gprecoverseg -i ./recov
[gpadmin@mdw ~]$ gprecoverseg -i ./recov :::: gprecoverseg:mdw:gpadmin-[INFO]:-Starting gprecoverseg with args: -i ./recov :::: gprecoverseg:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.8.1 build 1' :::: gprecoverseg:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.8.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Apr 20 2016 08:08:56' :::: gprecoverseg:mdw:gpadmin-[INFO]:-Checking if segments are ready :::: gprecoverseg:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... :::: gprecoverseg:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... :::: gprecoverseg:mdw:gpadmin-[INFO]:-Greenplum instance recovery parameters :::: gprecoverseg:mdw:gpadmin-[INFO]:---------------------------------------------------------- :::: gprecoverseg:mdw:gpadmin-[INFO]:-Recovery from configuration -i option supplied :::: gprecoverseg:mdw:gpadmin-[INFO]:---------------------------------------------------------- :::: gprecoverseg:mdw:gpadmin-[INFO]:-Recovery of :::: gprecoverseg:mdw:gpadmin-[INFO]:---------------------------------------------------------- :::: gprecoverseg:mdw:gpadmin-[INFO]:- Synchronization mode = Incremental :::: gprecoverseg:mdw:gpadmin-[INFO]:- Failed instance host = sdw2 :::: gprecoverseg:mdw:gpadmin-[INFO]:- Failed instance address = sdw2 :::: gprecoverseg:mdw:gpadmin-[INFO]:- Failed instance directory = /home/gpadmin/gpdata/gpdatam/gpseg0 :::: gprecoverseg:mdw:gpadmin-[INFO]:- Failed instance port = :::: gprecoverseg:mdw:gpadmin-[INFO]:- Failed instance replication port = :::: gprecoverseg:mdw:gpadmin-[INFO]:- Failed instance fastdisk directory = /data/gpdata/seg1/pg_mir_cdr/gpseg0 :::: gprecoverseg:mdw:gpadmin-[INFO]:- Recovery Source instance host = sdw1 :::: gprecoverseg:mdw:gpadmin-[INFO]:- Recovery Source instance address = sdw1 :::: gprecoverseg:mdw:gpadmin-[INFO]:- Recovery Source instance directory = /home/gpadmin/gpdata/gpdatap/gpseg0 :::: gprecoverseg:mdw:gpadmin-[INFO]:- Recovery Source instance port = :::: gprecoverseg:mdw:gpadmin-[INFO]:- Recovery Source instance replication port = :::: gprecoverseg:mdw:gpadmin-[INFO]:- Recovery Source instance fastdisk directory = /data/gpdata/seg1/pg_pri_cdr/gpseg0 :::: gprecoverseg:mdw:gpadmin-[INFO]:- Recovery Target = in-place :::: gprecoverseg:mdw:gpadmin-[INFO]:-Process results... :::: gprecoverseg:mdw:gpadmin-[INFO]:-Done updating primaries :::: gprecoverseg:mdw:gpadmin-[INFO]:-****************************************************************** :::: gprecoverseg:mdw:gpadmin-[INFO]:-Updating segments for resynchronization is completed. :::: gprecoverseg:mdw:gpadmin-[INFO]:-For segments updated successfully, resynchronization will continue in the background. :::: gprecoverseg:mdw:gpadmin-[INFO]:- :::: gprecoverseg:mdw:gpadmin-[INFO]:-Use gpstate -s to check the resynchronization progress. :::: gprecoverseg:mdw:gpadmin-[INFO]:-******************************************************************
4. 查看恢复状态
[gpadmin@mdw ~]$ gpstate -m :::: gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -m :::: gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.8.1 build 1' :::: gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.8.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Apr 20 2016 08:08:56' :::: gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... :::: gpstate:mdw:gpadmin-[INFO]:-------------------------------------------------------------- :::: gpstate:mdw:gpadmin-[INFO]:--Current GPDB mirror list and status :::: gpstate:mdw:gpadmin-[INFO]:--Type = Spread :::: gpstate:mdw:gpadmin-[INFO]:-------------------------------------------------------------- :::: gpstate:mdw:gpadmin-[INFO]:- Mirror Datadir Port Status Data Status :::: gpstate:mdw:gpadmin-[INFO]:- sdw2 /home/gpadmin/gpdata/gpdatam/gpseg0 Passive Resynchronizing :::: gpstate:mdw:gpadmin-[INFO]:- sdw1 /home/gpadmin/gpdata/gpdatam/gpseg1 Passive Synchronized :::: gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------
5. 到上一步,数据库的主备就恢复了,但是还有一步,是可选的。
你要不要把primary , mirror角色对调一下,因为现在mirror和primary和优先角色是相反的。
如果要对调,使用以下命令,会停库来处理。
gprecoverseg -r
【总结】
用于修复Segment的是gprecoverseg。使用方式比较简单,有限的几个主要参数如下:
-i :主要参数,用于指定一个配置文件,该配置文件描述了需要修复的Segment和修复后的目的位置。
-F :可选项,指定后,gprecoverseg会将”-i”中指定的或标记”d”的实例删除,并从活着的Mirror复制一个完整一份到目标位置。
-r :当FTS发现有Primary宕机并进行主备切换,在gprecoverseg修复后,担当Primary的Mirror角色并不会立即切换回来,就会导致部分主机上活跃的Segment过多从而引起性能瓶颈。因此需要恢复Segment原先的角色,称为re-balance。
Greenplum failed segment的恢复方法的更多相关文章
- Greenplum failed segment的恢复方法--primary与mirror都可修复
当在使用greenplum过程中有不当的操作时,可能会出现segment节点宕掉的情况(比如在greenplum运行的过程中停掉其中几台segment节点的服务器),通过下面的方法可以恢复segmen ...
- [原]Greenplum failed segment的恢复方法
当在使用greenplum过程中有不当的操作时,可能会出现segment节点宕掉的情况(比如在greenplum运行的过程中停掉其中几台segment节点的服务器),通过下面的方法可以恢复segmen ...
- MySQL全备+binlog恢复方法之伪装master【原创】
利用mysql全备 +binlog server恢复方法之伪装master 单实例试验 一.试验环境 10.72.7.40 实例 mysql3306为要恢复的对象,mysql3306的全备+binlo ...
- ORA-27125: unable to create shared memory segment的解决方法(转)
ORA-27125: unable to create shared memory segment的解决方法(转) # Kernel sysctl configuration file for Red ...
- Vertica集群单节点宕机恢复方法
Vertica集群单节点宕机恢复方法 第一种方法: 直接通过admintools -> 5 Restart Vertica on Host 第二种方法: 若第一种方法无法恢复,则清空宕机节点的c ...
- Oracle数据库常见的误操作恢复方法(上)
实验环境:Linux6.4 + Oracle 11g 面向读者:Oracle开发维护人员 概要: 1.误操作drop了emp表 2.误操作delete了emp表 3.误操作delete了emp表的部分 ...
- linux下rm误删除数据库文件的恢复方法
在linux redhat 5.4版本,rm误删除数据库文件的恢复过程分享.测试没有问题,可用. 1.首先测试rm 误删除数据库文件 [oracle@primary dbwdn]$ ll total ...
- 重装系统后QQ聊天记录恢复方法
重装系统后QQ聊天记录恢复方法 近日又一次安装了系统,又一次安装了腾讯的.TM,TM也是安装在之前的文件夹底下,可是聊天记录和之前的自己定义表情都不见了,看来没有自己主动恢复回来. 我这里另一个特殊的 ...
- Eclipse默认配色的恢复方法
Eclipse默认配色的恢复方法 很多搞开发的同学一开始不喜欢默认的eclipse白底配色,去网上千辛万苦搜到了很多黑底暗色的各种eclipse配色然后import上了,之后却发现并不适合自己,想找默 ...
随机推荐
- python_遇到问题
1. [出现问题]:cx_Oracle.DatabaseError: ORA-24315: 非法的属性类型 [原因]:是因为版本不兼容,检查了一下环境,我的oracle client是10g的,但我安 ...
- JS函数(获得widn)
//随机数生成器Math.random() 日期时间函数(需要用变量调用):var b = new Date(); //获取当前时间b.getTime() //获取时间戳b.getFullYear() ...
- Eclipse 调试 Java 程序的技巧
- 断点视图 : 条件断点 如果你只对应用中的某部分感兴趣的话,这个功能非常有用.例如,如果你要在第13次循环的时候检查程序,或者在一个抽象父类中调试某些功能,而你只关注其中一个具体的实现.你可以在断 ...
- Swift游戏实战-跑酷熊猫(一) 简介 (含源代码)
优酷观看地址:http://v.youku.com/v_show/id_XNzM2Nzc2MTIw.html 通过这个小游戏,我们能够接触到物理系统(SKPhysicsBody,physicsWorl ...
- BZOJ K大数查询(分治)(Zjoi2013)
题目链接:http://www.lydsy.com/JudgeOnline/problem.php?id=3110 Description 有N个位置,M个操作.操作有两种,每次操作如果是1 a b ...
- android studio ADB not responding.
打开cmd 输入 netstat -aon|findstr "5037" 找到谁在占用5037端口 记住他的pid. 例如pid为 2028 输入 taskkill ...
- 也不知怎么了LVS.SH找不到,网上搜了一篇环境搭配CENTOS下面的高可用 参考
系统环境: ************************************************************ 两台服务器都装了 CentOS-5.2-x86_64 系统 Vir ...
- scala2.10.x case classes cannot have more than 22 parameters
问题 这个错误出现在case class参数超出22个的时候. case classes cannot have more than 22 parameters 在scala 2.11.x版本以下时c ...
- paper 90:人脸检测研究2015最新进展
搜集整理了2004~2015性能最好的人脸检测的部分资料,欢迎交流和补充相关资料. 1:人脸检测性能 1.1 人脸检测测评 目前有两个比较大的人脸测评网站: 1:Face Detection Data ...
- oracle的会话(session)
会话(session)是oracle服务器对数据库连接用户记录的一种手段. oracle提供了v_$session的视图存储当前数据库的会话,查询时用v_$session 或v$session sql ...