图片:
RegionServer Split Process
  1. The RegionServer decides locally to split the region, and prepares the split. THE SPLIT TRANSACTION IS STARTED.As a first step, the RegionServer acquires a shared read lock on the table to prevent schema modifications during the splitting process. Then it creates a znode in zookeeper under /hbase/region-in-transition/region-name, and sets the znode’s state to SPLITTING.

  2. The Master learns about this znode, since it has a watcher for the parent region-in-transition znode.

  3. The RegionServer creates a sub-directory named .splits under the parent’s region directory in HDFS.

  4. The RegionServer closes the parent region and marks the region as offline in its local data structures. THE SPLITTING REGION IS NOW OFFLINE. At this point, client requests coming to the parent region will throwNotServingRegionException. The client will retry with some backoff. The closing region is flushed.

  5. The RegionServer creates region directories under the .splits directory, for daughter regions A and B, and creates necessary data structures. Then it splits the store files, in the sense that it creates two Reference files per store file in the parent region. Those reference files will point to the parent region’s files.

  6. The RegionServer creates the actual region directory in HDFS, and moves the reference files for each daughter.

  7. The RegionServer sends a Put request to the .META. table, to set the parent as offline in the .META. table and add information about daughter regions. At this point, there won’t be individual entries in .META. for the daughters. Clients will see that the parent region is split if they scan .META., but won’t know about the daughters until they appear in .META.. Also, if this Put to .META. succeeds, the parent will be effectively split. If the RegionServer fails before this RPC succeeds, Master and the next Region Server opening the region will clean dirty state about the region split. After the .META. update, though, the region split will be rolled-forward by Master.

  8. The RegionServer opens daughters A and B in parallel.

  9. The RegionServer adds the daughters A and B to .META., together with information that it hosts the regions.THE SPLIT REGIONS (DAUGHTERS WITH REFERENCES TO PARENT) ARE NOW ONLINE. After this point, clients can discover the new regions and issue requests to them. Clients cache the .META. entries locally, but when they make requests to the RegionServer or .META., their caches will be invalidated, and they will learn about the new regions from .META..

  10. The RegionServer updates znode /hbase/region-in-transition/region-name in ZooKeeper to state SPLIT, so that the master can learn about it. The balancer can freely re-assign the daughter regions to other region servers if necessary. THE SPLIT TRANSACTION IS NOW FINISHED.

  11. After the split, .META. and HDFS will still contain references to the parent region. Those references will be removed when compactions in daughter regions rewrite the data files. Garbage collection tasks in the master periodically check whether the daughter regions still refer to the parent region’s files. If not, the parent region will be removed.

  12. 本文链接地址:http://hbase.apache.org/book.html#regionserver.arch

RegionServer Splitting Implementation:regionServer 分裂过程分析的更多相关文章

  1. HBase RegionServer Splitting 流程

    RegionServer Splitting 实现 HBase 中的写请求由 Region Server 处理,这些数据首先存储在 memstore (RegionServer 里的一个存储系统)里. ...

  2. HBASE 优化之REGIONSERVER

    HBASE 优化之REGIONSERVER 一,概述 本人在使用优化regionserver的过程有些心得,借此随笔的机会,向大家介绍我的心得,有些是网上拿来的有些是自己在使用过程自己的经验,希望对大 ...

  3. HBase–RegionServer宕机恢复原理

    Region Server宕机总述 HBase一个很大的特色是扩展性极其友好,可以通过简单地加机器实现集群规模的线性扩展,而且机器的配置并不需要太好,通过大量廉价机器代替价格昂贵的高性能机器.但也正因 ...

  4. Hbase学习02

    第2章 Apache HBase配置 本章在“入门”一章中进行了扩展,以进一步解释Apache HBase的配置. 请仔细阅读本章,特别是基本先决条件,确保您的HBase测试和部署顺利进行,并防止数据 ...

  5. hbase官方文档(转)

    FROM:http://www.just4e.com/hbase.html Apache HBase™ 参考指南  HBase 官方文档中文版 Copyright © 2012 Apache Soft ...

  6. HBase官方文档

    HBase官方文档 目录 序 1. 入门 1.1. 介绍 1.2. 快速开始 2. Apache HBase (TM)配置 2.1. 基础条件 2.2. HBase 运行模式: 独立和分布式 2.3. ...

  7. Hbase集群搭建及所有配置调优参数整理及API代码运行

    最近为了方便开发,在自己的虚拟机上搭建了三节点的Hadoop集群与Hbase集群,hadoop集群的搭建与zookeeper集群这里就不再详细说明,原来的笔记中记录过.这里将hbase配置参数进行相应 ...

  8. HBase学习-HBase原理

    1.系统架构 1.1 图解   从HBase的架构图上可以看出,HBase中的组件包括Client.Zookeeper.HMaster.HRegionServer.HRegion.Store.MemS ...

  9. HBase 架构与工作原理5 - Region 的部分特性

    本文系转载,如有侵权,请联系我:likui0913@gmail.com Region Region 是表格可用性和分布的基本元素,由列族(Column Family)构成的 Store 组成.对象的层 ...

随机推荐

  1. pssh系列命令详解

    安装 pssh提供OpenSSH和相关工具的并行版本.包括pssh,pscp,prsync,pnuke和pslurp.该项目包括psshlib,可以在自定义应用程序中使用.pssh是python写的可 ...

  2. 基于名称快速定位文件和文件夹的搜索工具Everything和dll依赖查询工具Dependency Walker

    在工作中有时需要定位头文件.lib库文件.dll文件等的路径,自己去一个个盘符查找实在太麻烦,最近发现使用Everything这款工具很方便,下载地址为:下载 Everything 1.4.1.935 ...

  3. 搭建干净的Mac开发学习环境

    docker + linux + gcc/g++ https://www.jianshu.com/p/d113db99fe24 https://www.jianshu.com/p/d26140d20c ...

  4. 组件化框架设计之阿里巴巴开源路由框架——ARouter原理分析(一)

    阿里P7移动互联网架构师进阶视频(每日更新中)免费学习请点击:https://space.bilibili.com/474380680 背景 当项目的业务越来越复杂,业务线越来越多的时候,就需要按照业 ...

  5. pycharm内对python文件的模板

    #!/usr/bin/env python# -*- coding: utf-8 -*-# @Time : ${DATE} ${TIME}# @Author : Aries# @Site : ${SI ...

  6. Ubuntu下配置了ssh,但是连接很慢

    ssh登录服务器时总是要停顿等待一下才能连接上,这是因为OpenSSH服务器有一个DNS查找选项UseDNS默认是打开的. UseDNS选项打开状态下,当客户端试图登录OpenSSH服务器时,服务器端 ...

  7. 对struct typedef *的认识

    typedef struct node { ……… }NODE,*PNODE; 应该等价于 typedef struct node NODE;//struct node = NODE,eg:struc ...

  8. 生成token

    利用中间件生成token 1.安装中间件 npm install jsonwebtoken 2. 使用 Sign() 里面有3个参数,第一个是token里面传递的数据    ,第二个是 key ,第三 ...

  9. 关于云计算三大服务模式LAAS,PAAS,SAAS的含义及区别

    根据NIST的权威定义,云计算有SPI,即SAAS,PAAS和LAAS三大服务模式,上层是SAAS,中间层是PAAS,底层是LAAS,一层支撑一层. LAAS(Infrastucture-as-a-S ...

  10. 基于c语言数据结构+严蔚敏——线性表章节源码,利用Codeblocks编译通过

    白天没屌事,那我们就来玩玩线性表的实现吧,快要失业了,没饭吃了咋整哦 题目描述假设利用两个线性表LA和LB分别表示两个集合A和B(即:线性表中的数据元素即为集合中的成员),现要求一个新的集合A=A∪B ...