如何避免regionServer宕机

为什么regionserver 和Zookeeper的session expired? 可能的原因有

1. 网络不好。

2. Java full GC，这会block所有的线程。如果时间比较长，也会导致session expired.

怎么办？

1. 将Zookeeper的timeout时间加长。

2. 配置“hbase.regionserver.restart.on.zk.expire” 为true。这样子，遇到ZooKeeper session expired ， regionserver将选择 restart 而不是 abort

具体的配置是，在hbase-site.xml中加入

<name>zookeeper.session.timeout</name>

<description>ZooKeeper session timeout.

HBase passes this to the zk quorum as suggested maximum time for a

session. See http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions

“The client sends a requested timeout, the server responds with the

timeout that it can give the client. The current implementation

requires that the timeout be a minimum of 2 times the tickTime

(as set in the server configuration) and a maximum of 20 times

the tickTime.” Set the zk ticktime with hbase.zookeeper.property.tickTime.

In milliseconds.

</description>

</property>

<name>hbase.regionserver.restart.on.zk.expire</name>

Zookeeper session expired will force regionserver exit.

Enable this will make the regionserver restart.

</description>

</property>

为了避免java full GC suspend thread 对Zookeeper heartbeat的影响，我们还需要对hbase-env.sh进行配置。

将

export HBASE_OPTS="$HBASE_OPTS -XX:+HeapDumpOnOutOfMemoryError \

-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"

修改成

export HBASE_OPTS="$HBASE_OPTS -XX:+HeapDumpOnOutOfMemoryError \

-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled \

-XX:+CMSInitiatingOccupancyFraction=70 \

-XX:+UseCMSInitiatingOccupancyOnly -XX:+UseParNewGC -Xmn256m"


同时，当linux的maxfile设置过小时，scan多个列族也会造成regionServer宕机


JVM配置老忘，附带mark一下

1: heap size

a: -Xmx

指定jvm的最大heap大小，如：-Xmx=2g

b: -Xms

指定jvm的最小heap大小，如：-Xms=2g，高并发应用，建议和-Xmx一样，防止因为内存收缩/突然增大带来的性能影响。

c: -Xmn

指定jvm中New Generation的大小，如：-Xmn256m。这个参数很影响性能，如果你的程序需要比较多的临时内存，建议设置到512M，如果用的少，尽量降低这个数值，一般来说128/256足以使用了。

d: -XX:PermSize=

指定jvm中Perm Generation的最小值，如：-XX:PermSize=32m。这个参数需要看你的实际情况，可以通过jmap命令看看到底需要多少。

e: -XX:MaxPermSize=

指定Perm Generation的最大值，如：-XX:MaxPermSize=64m

f: -Xss

指定线程桟大小，如：-Xss128k，一般来说，webx框架下的应用需要256K。如果你的程序有大规模的递归行为，请考虑设置到512K/1M。这个需要全面的测试才能知道。不过，256K已经很大了。这个参数对性能的影响比较大的。

g: -XX:NewRatio=

指定jvm中Old Generation heap size与New Generation的比例，在使用CMS GC的情况下此参数失效，如：-XX:NewRatio=2

h: -XX:SurvivorRatio=

指定New Generation中Eden Space与一个Survivor Space的heap size比例，-XX:SurvivorRatio=8，那么在总共New Generation为10m的情况下，Eden Space为8m

i: -XX:MinHeapFreeRatio=

指定jvm heap在使用率小于n的情况下，heap进行收缩，Xmx==Xms的情况下无效，如：-XX:MinHeapFreeRatio=30

j: -XX:MaxHeapFreeRatio=

指定jvm heap在使用率大于n的情况下，heap 进行扩张，Xmx==Xms的情况下无效，如：-XX:MaxHeapFreeRatio=70

k: -XX:LargePageSizeInBytes=

指定Java heap的分页页面大小，如：-XX:LargePageSizeInBytes=128m

2: garbage collector

a: -XX:+UseParallelGC

指定在New Generation使用parallel collector，并行收集，暂停，app threads，同时启动多个垃圾回收thread，不能和CMS gc一起使用。系统吨吐量优先，但是会有较长长时间的app pause，后台系统任务可以使用此 gc

b: -XX:ParallelGCThreads=

指定parallel collection时启动的thread个数，默认是物理processor的个数

c: -XX:+UseParallelOldGC

指定在Old Generation使用parallel collector

d: -XX:+UseParNewGC

指定在New Generation使用parallel collector，是UseParallelGC的gc的升级版本，有更好的性能或者优点，可以和CMS gc一起使用

e: -XX:+CMSParallelRemarkEnabled

在使用UseParNewGC的情况下，尽量减少mark的时间

f: -XX:+UseConcMarkSweepGC

指定在Old Generation使用concurrent cmark sweep gc、gc thread和app thread并行（在init-mark和remark时pause app thread）。app pause时间较短，适合交互性强的系统，如web server

g: -XX:+UseCMSCompactAtFullCollection

在使用concurrent gc的情况下，防止memory fragmention，对live object进行整理，使memory 碎片减少

h: -XX:CMSInitiatingOccupancyFraction=

指示在old generation 在使用了n%的比例后，启动concurrent collector，默认值是68，如：-XX:CMSInitiatingOccupancyFraction=70

有个bug，在低版本（1.5.09 and early）的jvm上出现，

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6486089

i: -XX:+UseCMSInitiatingOccupancyOnly

指示只有在old generation在使用了初始化的比例后concurrent collector启动收集

3:others

a: -XX:MaxTenuringThreshold=

指定一个object在经历了n次young gc后转移到old generation区，在linux64的java6下默认值是15，此参数对于throughput collector无效，如：-XX:MaxTenuringThreshold=31

b: -XX:+DisableExplicitGC

禁止java程序中的full gc，如System.gc()的调用。最好加上么，防止程序在代码里误用了。对性能造成冲击。

c: -XX:+UseFastAccessorMethods

get、set方法转成本地代码

d: -XX:+PrintGCDetails

打应垃圾收集的情况如：

[GC 15610.466: [ParNew: 229689K->20221K(235968K), 0.0194460 secs] 1159829K->953935K(2070976K), 0.0196420 secs]

e: -XX:+PrintGCTimeStamps

打应垃圾收集的时间情况，如：

[Times: user=0.09 sys=0.00, real=0.02 secs]

f: -XX:+PrintGCApplicationStoppedTime

打应垃圾收集时，系统的停顿时间，如：

Total time for which application threads were stopped: 0.0225920 seconds

4: a web server product sample and process

JAVA_OPTS=" -server -Xmx2g -Xms2g -Xmn256m -XX:PermSize=128m -Xss256k -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:LargePageSizeInBytes=128m -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 "

最初的时候我们用UseParallelGC和UseParallelOldGC，heap开了3G，NewRatio设成1。这样的配置下young gc发生频率约12、3秒一次，平均每次花费80ms左右，full gc发生的频率极低，每次消耗1s左右。从所有gc消耗系统时间看，系统使用率还是满高的，但是不论是young gc还是old gc，application thread pause的时间比较长，不合适 web 应用。我们也调小New Generation的，但是这样会使full gc时间加长。

后来我们就用CMS gc（-XX:+UseConcMarkSweepGC），当时的总heap还是3g，新生代1.5g后，观察不是很理想，改为jvm heap为2g新生代设置-Xmn1g，在这样的情况下young gc发生的频率变成7、8秒一次，平均每次时间40-50毫秒左右，CMS gc很少发生，每次时间在init-mark和remark（two steps stop all app thread）总共平均花费80-90ms左右。

在这里我们曾经New Generation调大到1400m，总共2g的jvm heap，平均每次ygc花费时间60-70ms左右，CMS gc的init-mark和remark之和平均在50ms左右，这里我们意识到错误的方向，或者说CMS的作用，所以进行了修改。

最后我们调小New Generation为256m，young gc 2、3秒发生一次，平均停顿时间在25毫秒左右，CMS gc的init-mark和remark之和平均在50ms左右，这样使系统比较平滑，经压力测试，这个配置下系统性能是比较高的。

在使用CMS gc的时候他有两种触发gc的方式：gc估算触发和heap占用触发。我们的1.5.0.09 环境下有次old 区heap占用在30%左右，她就频繁gc，个人感觉系统估算触发这种方式不靠谱，还是用 heap 使用比率触发比较稳妥。

这些数据都来自64位测试机，过程中的数据都是我在jboss log找的，当时没有记下来，可能存在一点点偏差，但不会很大，基本过程就是这样。

5: 总结

web server作为交互性要求较高的应用，我们应该使用Parallel+CMS，UseParNewGC这个在jdk6 -server上是默认的new generation gc，新生代不能太大，这样每次pause会短一些。CMS mark-sweep generation可以大一些，可以根据pause time实际情况控制。