【前记】

Segment检测及故障切换机制
GP Master首先会检测Primary状态,如果Primary不可连通,那么将会检测Mirror状态,Primary/Mirror状态总共有4种:
1. Primary活着,Mirror活着。GP Master探测Primary成功之后直接返回,进行下一个Segment检测;
2. Primary活着,Mirror挂了。GP Master探测Primary成功之后,通过Primary返回的状态得知Mirror挂掉了(Mirror挂掉之后,Primary将会探测到,将自己变成ChangeTracking模式),这时候更新Master元信息,进行下一个Segment检测;
3. Primary挂了,Mirror活着。GP Master探测Primary失败之后探测Mirror,发现Mirror是活着,这时候更新Master上面的元信息,同时使Mirror接管Primary(故障切换),进行下一个Segment检测;
4. Primary挂了,Mirror挂了。GP Master探测Primary失败之后探测Mirror,Mirror也是挂了,直到重试最大值,结束这个Segment的探测,也不更新Master元信息了,进行下一个Segment检测。
上面的2-4需要进行gprecoverseg进行segment恢复。

对失败的segment节点;启动时会直接跳过,忽略。

[gpadmin@mdw ~]$ gpstart
:::: gpstart:mdw:gpadmin-[INFO]:-Starting gpstart with args:
:::: gpstart:mdw:gpadmin-[INFO]:-Gathering information and validating the environment...
:::: gpstart:mdw:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.8.1 build 1'
:::: gpstart:mdw:gpadmin-[INFO]:-Greenplum Catalog Version: '
:::: gpstart:mdw:gpadmin-[INFO]:-Starting Master instance in admin mode
:::: gpstart:mdw:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
:::: gpstart:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...
:::: gpstart:mdw:gpadmin-[INFO]:-Setting new master era
:::: gpstart:mdw:gpadmin-[INFO]:-Master Started...
:::: gpstart:mdw:gpadmin-[INFO]:-Shutting down master
:::: gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw2 directory /home/gpadmin/gpdata/gpdatam/gpseg0 <<<<<
:::: gpstart:mdw:gpadmin-[INFO]:---------------------------
:::: gpstart:mdw:gpadmin-[INFO]:-Master instance parameters
:::: gpstart:mdw:gpadmin-[INFO]:---------------------------
:::: gpstart:mdw:gpadmin-[INFO]:-Database                 = template1
:::: gpstart:mdw:gpadmin-[INFO]:-Master Port              =
:::: gpstart:mdw:gpadmin-[INFO]:-Master directory         = /home/gpadmin/gpdata/pgmaster/gpseg-
:::: gpstart:mdw:gpadmin-[INFO]:-Timeout                  =  seconds
:::: gpstart:mdw:gpadmin-[INFO]:-Master standby           = Off
:::: gpstart:mdw:gpadmin-[INFO]:---------------------------------------
:::: gpstart:mdw:gpadmin-[INFO]:-Segment instances that will be started
:::: gpstart:mdw:gpadmin-[INFO]:---------------------------------------
:::: gpstart:mdw:gpadmin-[INFO]:-   Host   Datadir                               Port    Role
:::: gpstart:mdw:gpadmin-[INFO]:-   sdw1   /home/gpadmin/gpdata/gpdatap/gpseg0      Primary
:::: gpstart:mdw:gpadmin-[INFO]:-   sdw2   /home/gpadmin/gpdata/gpdatap/gpseg1      Primary
:::: gpstart:mdw:gpadmin-[INFO]:-   sdw1   /home/gpadmin/gpdata/gpdatam/gpseg1      Mirror

Continue with Greenplum instance startup Yy|Nn (default=N):
> y
:::: gpstart:mdw:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...
...........
:::: gpstart:mdw:gpadmin-[INFO]:-Process results...
:::: gpstart:mdw:gpadmin-[INFO]:-----------------------------------------------------
:::: gpstart:mdw:gpadmin-[INFO]:-   Successful segment starts                                            =
:::: gpstart:mdw:gpadmin-[INFO]:-   Failed segment starts                                                =
:::: gpstart:mdw:gpadmin-[WARNING]:-Skipped segment starts (segments are marked down in configuration)   =    <<<<<<<<
:::: gpstart:mdw:gpadmin-[INFO]:-----------------------------------------------------
:::: gpstart:mdw:gpadmin-[INFO]:-
:::: gpstart:mdw:gpadmin-[INFO]:-Successfully started  of  segment instances, skipped  other segments
:::: gpstart:mdw:gpadmin-[INFO]:-----------------------------------------------------
:::: gpstart:mdw:gpadmin-[WARNING]:-****************************************************************************
:::: gpstart:mdw:gpadmin-[WARNING]:-There are  segment(s) marked down in the database
:::: gpstart:mdw:gpadmin-[WARNING]:-To recover from this current state, review usage of the gprecoverseg
:::: gpstart:mdw:gpadmin-[WARNING]:-management utility which will recover failed segment instance databases.
:::: gpstart:mdw:gpadmin-[WARNING]:-****************************************************************************
:::: gpstart:mdw:gpadmin-[INFO]:-Starting Master
:::: gpstart:mdw:gpadmin-[INFO]:-Command pg_ctl reports Master mdw instance active
:::: gpstart:mdw:gpadmin-[INFO]:-No standby master configured.  skipping...
:::: gpstart:mdw:gpadmin-[WARNING]:-Number of segments
:::: gpstart:mdw:gpadmin-[INFO]:-Check status of database with gpstate utility

查看数据库的mirror的节点启动状态

[gpadmin@mdw ~]$ gpstate -m
:::: gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -m
:::: gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.8.1 build 1'
:::: gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.8.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Apr 20 2016 08:08:56'
:::: gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...
:::: gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------
:::: gpstate:mdw:gpadmin-[INFO]:--Current GPDB mirror list and status
:::: gpstate:mdw:gpadmin-[INFO]:--Type = Spread
:::: gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------
:::: gpstate:mdw:gpadmin-[INFO]:-   Mirror   Datadir                               Port    Status    Data Status
:::: gpstate:mdw:gpadmin-[WARNING]:-sdw2     /home/gpadmin/gpdata/gpdatam/gpseg0      Failed                   <<<<<<<<
:::: gpstate:mdw:gpadmin-[INFO]:-   sdw1     /home/gpadmin/gpdata/gpdatam/gpseg1      Passive   Synchronized
:::: gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------
:::: gpstate:mdw:gpadmin-[WARNING]:- segment(s) configured as mirror(s) have failed

可直观看出“[WARNING]:-sdw2 /home/gpadmin/gpdata/gpdatam/gpseg0 50000 Failed ”

如何恢复这个mirror segment呢?当然primary segment也是这样恢复的

1. 首先产生一个恢复的配置文件 :    gprecoverseg -o ./recov

[gpadmin@mdw ~]$ gprecoverseg -o ./recov
:::: gprecoverseg:mdw:gpadmin-[INFO]:-Starting gprecoverseg with args: -o ./recov
:::: gprecoverseg:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.8.1 build 1'
:::: gprecoverseg:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.8.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Apr 20 2016 08:08:56'
:::: gprecoverseg:mdw:gpadmin-[INFO]:-Checking if segments are ready
:::: gprecoverseg:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...
:::: gprecoverseg:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...
:::: gprecoverseg:mdw:gpadmin-[INFO]:-Configuration file output to ./recov successfully.

2. 查看恢复的配置文件;可以知道哪些segment需要恢复

[gpadmin@mdw ~]$ cat recov
filespaceOrder=fastdisk
sdw2::/home/gpadmin/gpdata/gpdatam/gpseg0

3. 使用这个配置文件进行恢复 : gprecoverseg -i ./recov

[gpadmin@mdw ~]$ gprecoverseg -i ./recov
:::: gprecoverseg:mdw:gpadmin-[INFO]:-Starting gprecoverseg with args: -i ./recov
:::: gprecoverseg:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.8.1 build 1'
:::: gprecoverseg:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.8.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Apr 20 2016 08:08:56'
:::: gprecoverseg:mdw:gpadmin-[INFO]:-Checking if segments are ready
:::: gprecoverseg:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...
:::: gprecoverseg:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...
:::: gprecoverseg:mdw:gpadmin-[INFO]:-Greenplum instance recovery parameters
:::: gprecoverseg:mdw:gpadmin-[INFO]:----------------------------------------------------------
:::: gprecoverseg:mdw:gpadmin-[INFO]:-Recovery from configuration -i option supplied
:::: gprecoverseg:mdw:gpadmin-[INFO]:----------------------------------------------------------
:::: gprecoverseg:mdw:gpadmin-[INFO]:-Recovery  of
:::: gprecoverseg:mdw:gpadmin-[INFO]:----------------------------------------------------------
:::: gprecoverseg:mdw:gpadmin-[INFO]:-   Synchronization mode                          = Incremental
:::: gprecoverseg:mdw:gpadmin-[INFO]:-   Failed instance host                          = sdw2
:::: gprecoverseg:mdw:gpadmin-[INFO]:-   Failed instance address                       = sdw2
:::: gprecoverseg:mdw:gpadmin-[INFO]:-   Failed instance directory                     = /home/gpadmin/gpdata/gpdatam/gpseg0
:::: gprecoverseg:mdw:gpadmin-[INFO]:-   Failed instance port                          =
:::: gprecoverseg:mdw:gpadmin-[INFO]:-   Failed instance replication port              =
:::: gprecoverseg:mdw:gpadmin-[INFO]:-   Failed instance fastdisk directory            = /data/gpdata/seg1/pg_mir_cdr/gpseg0
:::: gprecoverseg:mdw:gpadmin-[INFO]:-   Recovery Source instance host                 = sdw1
:::: gprecoverseg:mdw:gpadmin-[INFO]:-   Recovery Source instance address              = sdw1
:::: gprecoverseg:mdw:gpadmin-[INFO]:-   Recovery Source instance directory            = /home/gpadmin/gpdata/gpdatap/gpseg0
:::: gprecoverseg:mdw:gpadmin-[INFO]:-   Recovery Source instance port                 =
:::: gprecoverseg:mdw:gpadmin-[INFO]:-   Recovery Source instance replication port     =
:::: gprecoverseg:mdw:gpadmin-[INFO]:-   Recovery Source instance fastdisk directory   = /data/gpdata/seg1/pg_pri_cdr/gpseg0
:::: gprecoverseg:mdw:gpadmin-[INFO]:-   Recovery Target                               = in-place
:::: gprecoverseg:mdw:gpadmin-[INFO]:-Process results...
:::: gprecoverseg:mdw:gpadmin-[INFO]:-Done updating primaries
:::: gprecoverseg:mdw:gpadmin-[INFO]:-******************************************************************
:::: gprecoverseg:mdw:gpadmin-[INFO]:-Updating segments for resynchronization is completed.
:::: gprecoverseg:mdw:gpadmin-[INFO]:-For segments updated successfully, resynchronization will continue in the background.
:::: gprecoverseg:mdw:gpadmin-[INFO]:-
:::: gprecoverseg:mdw:gpadmin-[INFO]:-Use  gpstate -s  to check the resynchronization progress.
:::: gprecoverseg:mdw:gpadmin-[INFO]:-******************************************************************

4. 查看恢复状态

[gpadmin@mdw ~]$ gpstate -m
:::: gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -m
:::: gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.8.1 build 1'
:::: gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.8.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Apr 20 2016 08:08:56'
:::: gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...
:::: gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------
:::: gpstate:mdw:gpadmin-[INFO]:--Current GPDB mirror list and status
:::: gpstate:mdw:gpadmin-[INFO]:--Type = Spread
:::: gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------
:::: gpstate:mdw:gpadmin-[INFO]:-   Mirror   Datadir                               Port    Status    Data Status
:::: gpstate:mdw:gpadmin-[INFO]:-   sdw2     /home/gpadmin/gpdata/gpdatam/gpseg0      Passive   Resynchronizing
:::: gpstate:mdw:gpadmin-[INFO]:-   sdw1     /home/gpadmin/gpdata/gpdatam/gpseg1      Passive   Synchronized
:::: gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------

5. 到上一步,数据库的主备就恢复了,但是还有一步,是可选的。
你要不要把primary , mirror角色对调一下,因为现在mirror和primary和优先角色是相反的。
如果要对调,使用以下命令,会停库来处理。

gprecoverseg -r

【总结】

用于修复Segment的是gprecoverseg。使用方式比较简单,有限的几个主要参数如下:
 -i :主要参数,用于指定一个配置文件,该配置文件描述了需要修复的Segment和修复后的目的位置。
 -F :可选项,指定后,gprecoverseg会将”-i”中指定的或标记”d”的实例删除,并从活着的Mirror复制一个完整一份到目标位置。
 -r :当FTS发现有Primary宕机并进行主备切换,在gprecoverseg修复后,担当Primary的Mirror角色并不会立即切换回来,就会导致部分主机上活跃的Segment过多从而引起性能瓶颈。因此需要恢复Segment原先的角色,称为re-balance。

Greenplum failed segment的恢复方法的更多相关文章

  1. Greenplum failed segment的恢复方法--primary与mirror都可修复

    当在使用greenplum过程中有不当的操作时,可能会出现segment节点宕掉的情况(比如在greenplum运行的过程中停掉其中几台segment节点的服务器),通过下面的方法可以恢复segmen ...

  2. [原]Greenplum failed segment的恢复方法

    当在使用greenplum过程中有不当的操作时,可能会出现segment节点宕掉的情况(比如在greenplum运行的过程中停掉其中几台segment节点的服务器),通过下面的方法可以恢复segmen ...

  3. MySQL全备+binlog恢复方法之伪装master【原创】

    利用mysql全备 +binlog server恢复方法之伪装master 单实例试验 一.试验环境 10.72.7.40 实例 mysql3306为要恢复的对象,mysql3306的全备+binlo ...

  4. ORA-27125: unable to create shared memory segment的解决方法(转)

    ORA-27125: unable to create shared memory segment的解决方法(转) # Kernel sysctl configuration file for Red ...

  5. Vertica集群单节点宕机恢复方法

    Vertica集群单节点宕机恢复方法 第一种方法: 直接通过admintools -> 5 Restart Vertica on Host 第二种方法: 若第一种方法无法恢复,则清空宕机节点的c ...

  6. Oracle数据库常见的误操作恢复方法(上)

    实验环境:Linux6.4 + Oracle 11g 面向读者:Oracle开发维护人员 概要: 1.误操作drop了emp表 2.误操作delete了emp表 3.误操作delete了emp表的部分 ...

  7. linux下rm误删除数据库文件的恢复方法

    在linux redhat 5.4版本,rm误删除数据库文件的恢复过程分享.测试没有问题,可用. 1.首先测试rm 误删除数据库文件 [oracle@primary dbwdn]$ ll total ...

  8. 重装系统后QQ聊天记录恢复方法

    重装系统后QQ聊天记录恢复方法 近日又一次安装了系统,又一次安装了腾讯的.TM,TM也是安装在之前的文件夹底下,可是聊天记录和之前的自己定义表情都不见了,看来没有自己主动恢复回来. 我这里另一个特殊的 ...

  9. Eclipse默认配色的恢复方法

    Eclipse默认配色的恢复方法 很多搞开发的同学一开始不喜欢默认的eclipse白底配色,去网上千辛万苦搜到了很多黑底暗色的各种eclipse配色然后import上了,之后却发现并不适合自己,想找默 ...

随机推荐

  1. python_遇到问题

    1. [出现问题]:cx_Oracle.DatabaseError: ORA-24315: 非法的属性类型 [原因]:是因为版本不兼容,检查了一下环境,我的oracle client是10g的,但我安 ...

  2. JS函数(获得widn)

    //随机数生成器Math.random() 日期时间函数(需要用变量调用):var b = new Date(); //获取当前时间b.getTime() //获取时间戳b.getFullYear() ...

  3. Eclipse 调试 Java 程序的技巧

    - 断点视图 : 条件断点 如果你只对应用中的某部分感兴趣的话,这个功能非常有用.例如,如果你要在第13次循环的时候检查程序,或者在一个抽象父类中调试某些功能,而你只关注其中一个具体的实现.你可以在断 ...

  4. Swift游戏实战-跑酷熊猫(一) 简介 (含源代码)

    优酷观看地址:http://v.youku.com/v_show/id_XNzM2Nzc2MTIw.html 通过这个小游戏,我们能够接触到物理系统(SKPhysicsBody,physicsWorl ...

  5. BZOJ K大数查询(分治)(Zjoi2013)

    题目链接:http://www.lydsy.com/JudgeOnline/problem.php?id=3110 Description 有N个位置,M个操作.操作有两种,每次操作如果是1 a b ...

  6. android studio ADB not responding.

    打开cmd    输入  netstat -aon|findstr "5037"   找到谁在占用5037端口 记住他的pid. 例如pid为 2028 输入  taskkill ...

  7. 也不知怎么了LVS.SH找不到,网上搜了一篇环境搭配CENTOS下面的高可用 参考

    系统环境: ************************************************************ 两台服务器都装了 CentOS-5.2-x86_64 系统 Vir ...

  8. scala2.10.x case classes cannot have more than 22 parameters

    问题 这个错误出现在case class参数超出22个的时候. case classes cannot have more than 22 parameters 在scala 2.11.x版本以下时c ...

  9. paper 90:人脸检测研究2015最新进展

    搜集整理了2004~2015性能最好的人脸检测的部分资料,欢迎交流和补充相关资料. 1:人脸检测性能 1.1 人脸检测测评 目前有两个比较大的人脸测评网站: 1:Face Detection Data ...

  10. oracle的会话(session)

    会话(session)是oracle服务器对数据库连接用户记录的一种手段. oracle提供了v_$session的视图存储当前数据库的会话,查询时用v_$session 或v$session sql ...