hbase集群region数量和大小的影响
1、Region数量的影响
2、region大小的影响
169.2.2. Number of regions per RS - upper bound
In production scenarios, where you have a lot of data, you are normally concerned with the maximum number of regions you can have per server. too many regions has technical discussion on the subject. Basically, the maximum number of regions is mostly determined by memstore memory usage. Each region has its own memstores; these grow up to a configurable size; usually in 128-256 MB range, see hbase.hregion.memstore.flush.size. One memstore exists per column family (so there’s only one per region if there’s one CF in the table). The RS dedicates some fraction of total memory to its memstores (see hbase.regionserver.global.memstore.size). If this memory is exceeded (too much memstore usage), it can cause undesirable consequences such as unresponsive server or compaction storms. A good starting point for the number of regions per RS (assuming one table) is:
((RS memory) * (total memstore fraction)) / ((memstore size)*(# column families))
This formula is pseudo-code. Here are two formulas using the actual tunable parameters, first for HBase 0.98+ and second for HBase 0.94.x.
- HBase 0.98.x
((RS Xmx) * hbase.regionserver.global.memstore.size) / (hbase.hregion.memstore.flush.size * (# column families))
- HBase 0.94.x
((RS Xmx) * hbase.regionserver.global.memstore.upperLimit) / (hbase.hregion.memstore.flush.size * (# column families))+
If a given RegionServer has 16 GB of RAM, with default settings, the formula works out to 16384*0.4/128 ~ 51 regions per RS is a starting point. The formula can be extended to multiple tables; if they all have the same configuration, just use the total number of families.
This number can be adjusted; the formula above assumes all your regions are filled at approximately the same rate. If only a fraction of your regions are going to be actively written to, you can divide the result by that fraction to get a larger region count. Then, even if all regions are written to, all region memstores are not filled evenly, and eventually jitter appears even if they are (due to limited number of concurrent flushes). Thus, one can have as many as 2-3 times more regions than the starting point; however, increased numbers carry increased risk.
For write-heavy workload, memstore fraction can be increased in configuration at the expense of block cache; this will also allow one to have more regions.
169.2.3. Number of regions per RS - lower bound
HBase scales by having regions across many servers. Thus if you have 2 regions for 16GB data, on a 20 node machine your data will be concentrated on just a few machines - nearly the entire cluster will be idle. This really can’t be stressed enough, since a common problem is loading 200MB data into HBase and then wondering why your awesome 10 node cluster isn’t doing anything.
On the other hand, if you have a very large amount of data, you may also want to go for a larger number of regions to avoid having regions that are too large.
169.2.4. Maximum region size
For large tables in production scenarios, maximum region size is mostly limited by compactions - very large compactions, esp. major, can degrade cluster performance. Currently, the recommended maximum region size is 10-20Gb, and 5-10Gb is optimal. For older 0.90.x codebase, the upper-bound of regionsize is about 4Gb, with a default of 256Mb.
The size at which the region is split into two is generally configured via hbase.hregion.max.filesize; for details, see arch.region.splits.
If you cannot estimate the size of your tables well, when starting off, it’s probably best to stick to the default region size, perhaps going smaller for hot tables (or manually split hot regions to spread the load over the cluster), or go with larger region sizes if your cell sizes tend to be largish (100k and up).
In HBase 0.98, experimental stripe compactions feature was added that would allow for larger regions, especially for log data. See ops.stripe.
169.2.5. Total data size per region server
According to above numbers for region size and number of regions per region server, in an optimistic estimate 10 GB x 100 regions per RS will give up to 1TB served per region server, which is in line with some of the reported multi-PB use cases. However, it is important to think about the data vs cache size ratio at the RS level. With 1TB of data per server and 10 GB block cache, only 1% of the data will be cached, which may barely cover all block indices.
hbase集群region数量和大小的影响的更多相关文章
- 读者来信-5 | 如果你家HBase集群Region太多请点进来看看,这个问题你可能会遇到
前言:<读者来信>是HBase老店开设的一个问答专栏,旨在能为更多的小伙伴解决工作中常遇到的HBase相关的问题.老店会尽力帮大家解决这些问题或帮你发出求救贴,老店希望这会是一个互帮互助的 ...
- 读者来信 | 如果你家HBase集群Region太多请点进来看看,这个问题你可能会遇到
前言:<读者来信>是HBase老店开设的一个问答专栏,旨在能为更多的小伙伴解决工作中常遇到的HBase相关的问题.老店会尽力帮大家解决这些问题或帮你发出求救贴,老店希望这会是一个互帮互助的 ...
- hbase集群安装与部署
1.相关环境 centos7 hadoop2.6.5 zookeeper3.4.9 jdk1.8 hbase1.2.4 本篇文章仅涉及hbase集群的搭建,关于hadoop与zookeeper的相关部 ...
- Hbase集群搭建及所有配置调优参数整理及API代码运行
最近为了方便开发,在自己的虚拟机上搭建了三节点的Hadoop集群与Hbase集群,hadoop集群的搭建与zookeeper集群这里就不再详细说明,原来的笔记中记录过.这里将hbase配置参数进行相应 ...
- Hbase集群监控
Hbase集群监控 Hbase Jmx监控 监控每个regionServer的总请求数,readRequestsCount,writeRequestCount,region分裂,region合并,St ...
- 基于docker快速搭建hbase集群
一.概述 HBase是一个分布式的.面向列的开源数据库,该技术来源于 Fay Chang 所撰写的Google论文"Bigtable:一个结构化数据的分布式存储系统".就像Bigt ...
- Hadoop hbase集群断电数据块被破坏无法启动
集群机器意外断电重启,导致hbase 无法正常启动,抛出reflect invocation异常,可能是正在执行的插入或合并等操作进行到一半时中断,导致部分数据文件不完整格式不正确或在hdfs上blo ...
- Apache HBase 集群安装文档
简介: Apache HBase 是一个分布式的.面向列的开源 NoSQL 数据库.具有高性能.高可靠性.可伸缩.面向列.分布式存储的特性. HBase 的数据文件最终落地在 HDFS 之上,所以在 ...
- 使用Hbase快照将数据输出到互联网区测试环境的临时Hbase集群
通过snapshot对内网测试环境Hbase生产集群的全量数据(包括原始数据和治理后数据)复制到互联网Hbase临时集群.工具及原理: 1) Hbase自带镜像导出工具(snapsho ...
随机推荐
- 开发板编译./camera显示-/bin/sh: ./camera: not found解决方案
问题: 开发板根文件系统目录: 运行./camera显示: 问题解决: 1.排除根目录路径问题: 2. 加入静态链接库即无问题,但是编译后的".o"文件大小突增,而且也不可能每次编 ...
- linux svn 中文 https://my.oschina.net/VASKS/blog/659236
https://my.oschina.net/VASKS/blog/659236 设置服务器: export LC_ALL=zh_CN.UTF-8长久之计, echo export LC_ALL=zh ...
- UVA101 The Blocks Problem 题解
题目链接:https://www.luogu.org/problemnew/show/UVA101 这题码量稍有点大... 分析: 这道题模拟即可.因为考虑到所有的操作vector可最快捷的实现,所以 ...
- 网页学习:day1
初始准备: Write some function Write a titie Write a article Write some button Button function写法: functio ...
- 【Android UI】侧滑栏的使用(HorizontalScrollView控件的使用)
主要的用到的控件:HorizontalScrollView 主要的功能:把几张图片解析成一张图片,在一个容器中呈现. 布局文件xml side_bar_scollview.xml//显示view的容器 ...
- Netty编码流程及WriteAndFlush()的实现
编码器的执行时机 首先, 我们想通过服务端,往客户端发送数据, 通常我们会调用ctx.writeAndFlush(数据)的方式, 入参位置的数据可能是基本数据类型,也可能对象 其次,编码器同样属于ha ...
- HIVE之 DDL 数据定义 & DML数据操作
DDL数据库定义 创建数据库 1)创建一个数据库,数据库在 HDFS 上的默认存储路径是/user/hive/warehouse/*.db. hive (default)> create dat ...
- nginx CRLF(换行回车)注入漏洞复现
nginx CRLF(换行回车)注入漏洞复现 一.漏洞描述 CRLF是”回车+换行”(\r\n)的简称,其十六进制编码分别为0x0d和0x0a.在HTTP协议中,HTTP header与HTTP Bo ...
- layer设置maxWidth及maxHeight解决方案
layer介绍 layer是一款近年来备受青睐的web弹层组件,她具备全方位的解决方案,致力于服务各水平段的开发人员,您的页面会轻松地拥有丰富友好的操作体验.下载及使用访问官方网站. area属性 l ...
- JavaScript基础学习第六天
目标: 能够使用对象的方式处理数据 ☞ 代码预解析: 1. 变量提升 :当程序中遇到定义变量后,就会将该变量的定义提升到当前作用域的开始位置,不包括变量的赋值 2. 函数提升:当程序中遇到函数的声明时 ...