HBase BlockCache

1. Cache 读写

调用逻辑：

hmaster.handleCreateTable->HRegion.createHRegion-> HRegion. initialize->initializeRegionInternals->instantiateHStore

->Store.Store->new CacheConfig(conf, family)-> CacheConfig.instantiateBlockCache->new LruBlockCache

传入参数

/**
* Configurable constructor. Use this constructor if not using defaults.
* @param maxSize maximum size of this cache, in bytes
* @param blockSize expected average size of blocks, in bytes
* @param evictionThread whether to run evictions in a bg thread or not
* @param mapInitialSize initial size of backing ConcurrentHashMap
* @param mapLoadFactor initial load factor of backing ConcurrentHashMap
* @param mapConcurrencyLevel initial concurrency factor for backing CHM
* @param minFactor percentage of total size that eviction will evict until
* @param acceptableFactor percentage of total size that triggers eviction
* @param singleFactor percentage of total size for single-access blocks
* @param multiFactor percentage of total size for multiple-access blocks
* @param memoryFactor percentage of total size for in-memory blocks
*/
public LruBlockCache(long maxSize, long blockSize, boolean evictionThread,
int mapInitialSize, float mapLoadFactor, int mapConcurrencyLevel,
float minFactor, float acceptableFactor,
float singleFactor, float multiFactor, float memoryFactor)

new LruBlockCache时除了设置默认的参数外，还会创建evictionThread并wait和一个定时打印的线程StatisticsThread

当执行HFileReaderV2的readBlock时，会先看判断是否开户了Cache ，如果开启，则使用cache中block

// Check cache for block. If found return.
if (cacheConf.isBlockCacheEnabled()) {
// Try and get the block from the block cache. If the useLock variable is true then this
// is the second time through the loop and it should not be counted as a block cache miss.
HFileBlock cachedBlock = (HFileBlock)
cacheConf.getBlockCache().getBlock(cacheKey, cacheBlock, useLock);
if (cachedBlock != null) {
BlockCategory blockCategory =
cachedBlock.getBlockType().getCategory();
getSchemaMetrics().updateOnCacheHit(blockCategory, isCompaction);
if (cachedBlock.getBlockType() == BlockType.DATA) {
HFile.dataBlockReadCnt.incrementAndGet();
}
validateBlockType(cachedBlock, expectedBlockType);
// Validate encoding type for encoded blocks. We include encoding
// type in the cache key, and we expect it to match on a cache hit.
if (cachedBlock.getBlockType() == BlockType.ENCODED_DATA &&
cachedBlock.getDataBlockEncoding() !=
dataBlockEncoder.getEncodingInCache()) {
throw new IOException(“Cached block under key ” + cacheKey + “ ” +
“has wrong encoding: ” + cachedBlock.getDataBlockEncoding() +
“ (expected: ” + dataBlockEncoder.getEncodingInCache() + “)”);
}
return cachedBlock;
}
// Carry on, please load.
}

在getBlock方法中，会更新一些统计数据，重要的时更新

BlockPriority.SINGLE为BlockPriority.MULTI
public Cacheable getBlock(BlockCacheKey cacheKey, boolean caching, boolean repeat) {
CachedBlock cb = map.get(cacheKey);
if(cb == null) {
if (!repeat) stats.miss(caching);
return null;
}
stats.hit(caching);
cb.access(count.incrementAndGet());
return cb.getBuffer();
}

———————

若是第一次读，则将block加入Cache.

// Cache the block if necessary
if (cacheBlock && cacheConf.shouldCacheBlockOnRead(
hfileBlock.getBlockType().getCategory())) {
cacheConf.getBlockCache().cacheBlock(cacheKey, hfileBlock,
cacheConf.isInMemory());
}

2. LRU evict

写入cache时就是将block加入到一个 ConcurrentHashMap中，并更新Metrics,之后判断if(newSize > acceptableSize() && !evictionInProgress), acceptableSize是初始化时给的值(long)Math.floor(this.maxSize
* this.acceptableFactor)，acceptableFactor是一个百分比，是可以配置的：”hbase.lru.blockcache.acceptable.factor”(0.85f)，这里的意思就是判断总Size是不是大于这个值，如果大于并且没有正在执行的eviction线程，
那么就执行evict。

/**
* Cache the block with the specified name and buffer.
* <p>
* It is assumed this will NEVER be called on an already cached block. If
* that is done, an exception will be thrown.
* @param cacheKey block’s cache key
* @param buf block buffer
* @param inMemory if block is in-memory
*/
public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean inMemory) {
CachedBlock cb = map.get(cacheKey);
if(cb != null) {
throw new RuntimeException(“Cached an already cached block”);
}
cb = new CachedBlock(cacheKey, buf, count.incrementAndGet(), inMemory);
long newSize = updateSizeMetrics(cb, false);
map.put(cacheKey, cb);
elements.incrementAndGet();
if(newSize > acceptableSize() && !evictionInProgress) {
runEviction();
}
}

在evict方法中，

1. 计算总size和需要free的size, minsize = (long)Math.floor(this.maxSize * this.minFactor);其中minFactor是可配置的”hbase.lru.blockcache.min.factor”（0.75f）;

long currentSize = this.size.get();
long bytesToFree = currentSize - minSize();

2. 初始化三种BlockBucket：bucketSingle，bucketMulti，bucketMemory并遍历map,按照三种类型分别add进各自的queue（MinMaxPriorityQueue.expectedSize(initialSize).create();）中，并按照访问的次数逆序。

三种类型的区别是：

    SINGLE对应第一次读的

    MULTI对应多次读

    MEMORY是设定column family中的IN_MEMORY为true的

// Instantiate priority buckets
BlockBucket bucketSingle = new BlockBucket(bytesToFree, blockSize,
singleSize());
BlockBucket bucketMulti = new BlockBucket(bytesToFree, blockSize,
multiSize());
BlockBucket bucketMemory = new BlockBucket(bytesToFree, blockSize,
memorySize());

其中三种BlockBuckt Size大小分配比例默认是：

static final float DEFAULT_SINGLE_FACTOR = 0.25f;

static final float DEFAULT_MULTI_FACTOR = 0.50f;

static final float DEFAULT_MEMORY_FACTOR = 0.25f;

private long singleSize() {
return (long)Math.floor(this.maxSize * this.singleFactor * this.minFactor);
}
private long multiSize() {
return (long)Math.floor(this.maxSize * this.multiFactor * this.minFactor);
}
private long memorySize() {
return (long)Math.floor(this.maxSize * this.memoryFactor * this.minFactor);
}

并将三种BlockBuckt 加入到优先队列中，按照totalSize – bucketSize排序，,再计算需要free大小，执行free：

PriorityQueue<BlockBucket> bucketQueue =
new PriorityQueue<BlockBucket>(3);
bucketQueue.add(bucketSingle);
bucketQueue.add(bucketMulti);
bucketQueue.add(bucketMemory);
int remainingBuckets = 3;
long bytesFreed = 0;
BlockBucket bucket;
while((bucket = bucketQueue.poll()) != null) {
long overflow = bucket.overflow();
if(overflow > 0) {
long bucketBytesToFree = Math.min(overflow,
(bytesToFree - bytesFreed) / remainingBuckets);
bytesFreed += bucket.free(bucketBytesToFree);
}
remainingBuckets–;
}

free方法中一个一个取出queue中block，由于是按照访问次数逆序，所以从后面取出就是先取出访问次数少的，将其在map中一个一个remove，并更新Mertrics.

public long free(long toFree) {
CachedBlock cb;
long freedBytes = 0;
while ((cb = queue.pollLast()) != null) {
freedBytes += evictBlock(cb);
if (freedBytes >= toFree) {
return freedBytes;
}
}
return freedBytes;
}
otected long evictBlock(CachedBlock block) {
map.remove(block.getCacheKey());
updateSizeMetrics(block, true);
elements.decrementAndGet();
stats.evicted();
return block.heapSize();

3. HBase LruBlockCache的特点是针对不同的访问次数使用不同的策略，避免频繁的更新的Cache（便如Scan）,这样更加有利于提高读的性能。

HBase BlockCache的更多相关文章

hbase实践（十六） BlockCache
0 引言和其他数据库一样,优化IO也是HBase提升性能的不二法宝,而提供缓存更是优化的重中之重. 根据二八法则,80%的业务请求都集中在20%的热点数据上,因此将这部分数据缓存起就可以极大地提升系 ...
HBase 查询导致RegionServer OOM故障复盘
背景:我司作为某运营商公司的技术咨询公司,发现有第三方开发公司在使用HBase 1.1.2 (HDP 2.4.2.258版本)一段时间使用正常后,从某一天开始报OOM,从而导致RegionServer ...
HBase的BlockCache
BlockCache 首先要明白Block,在HBase里面存储的最小单元:在memstore向硬盘刷的时候,如果目标block的大小+size之后大于MAX_SIZE,将会新创建一个block来存储 ...
HBase之BlockCache数据读取(转)
转自:http://blog.csdn.net/u014297175/article/details/47976909 Hbase上Regionserver的内存分为两个部分,一部分作为Memstor ...
hbase伪分布式平台搭建（centos 6.3）
搭建完<hadoop伪分布式平台>后就开始搭建hbase伪分布式平台了.有了hadoop环境,搭建hbase就变得很容易了. 一.Hbase安装 1.从官网下载最新版本Hbase安装包1. ...
HBase change split policy on an existing table
hbase(main)::> create 'test_table_region', 'username' row(s) in 1.2150 seconds hbase(main)::> ...
HBase的基本架构及其原理介绍
1.概述:最近,有一些工程师问我有关HBase的基本架构的问题,其实这个问题仅仅说架构是非常简单,但是需要理解.在这里,我觉得可以用HDFS的架构作为借鉴.(其实像Hadoop生态系统中的大部分组建的 ...
HBase设计与开发性能优化(转)
本文主要是从HBase应用程序设计与开发的角度,总结几种常用的性能优化方法.有关HBase系统配置级别的优化,这里涉及的不多,这部分可以参考:淘宝Ken Wu同学的博客. 1. 表的设计 1.1 Pr ...
hive与hbase整合过程
实现目标 Hive可以实时查询Hbase中的数据. hive中的表插入数据会同步更新到hbase对应的表中. 可以将hbase中不同的表中的列通过 left 或 inner join 方式映射到hiv ...

随机推荐

Dynamics CRM 为Visual Studio 2015安装CRM Developer Toolkit
从CRM2015的SDK以后Tools的文件夹里就没有了DeveloperToolkit,而DeveloperToolkit还是停留在VS2012版本,这对于我们这种用新版本的童鞋来说比较头疼,我本地 ...
java中&和&&的区别位运算
1.1. 逻辑与的运算符功能 1.1.1. 测试&& public static void main(String[] args) { int x=5; if (x==6 && ...
MTK8127源码编译出现的错误及相关解决办法
/** * date:2016/8/17 * author: Y.X .YANG */ 按照开发文档提示: 1.MTK提供的开发包目录下有若干个.aa .ab .ac ...的分压缩包.此时应当将这些 ...
lager_transform未定义错误
lager_transform未定义错误rebar编译时报错:D:\server\six>d:/tools/rebar/rebar.cmd compile==> mysql (compil ...
Android开发_TextView跑马灯
关键代码: android:singleLine="true" android:ellipsize="marquee" android:focusable=&q ...
【Unity Shaders】ShadowGun系列之一——飞机坠毁的浓烟效果
写在前面最近一直在思考下面的学习该怎么进行,当然自己有在一边做项目一边学OpenGL,偶尔翻翻论文之类的.但是,写shader是一个需要实战和动手经验的过程,而模仿是前期学习的必经之路.很多人都会问 ...
操作系统处理内存时内存页为4k
windows和unix处理内存时,一个内存页的大小都为4k. 测试代码 int main() { while (1) { int *p = (int *)malloc(1024); getchar( ...
eclipse代码恢复（开发程序代码恢复）
如果误操作,让本地代码丢失了不用怕,Eclipse local history可以恢复. 误删除文件后,直接ctrl+z可以恢复. 拉去代码覆盖了本地,也可以一个一个或者整体进行恢复:http://b ...
[GitHub]第一讲：浏览器中使用GitHub
文章转载自http://blog.csdn.net/loadsong/article/details/51591407 看到一篇关于GitHub的文章,感觉不错,因此转载来以备推敲学习. 不会用 Gi ...
【自制插件】将MMD4Mecanim转换的MMD模型导入maya
这个已经废弃了_(:зゝ∠)_,另外做了升级版: http://www.cnblogs.com/marisa/p/5174150.html ============================== ...

HBase BlockCache

HBase BlockCache的更多相关文章

随机推荐

热门专题