HFileBlock官方源码注释:
Reading
HFile version 1 and 2 blocks, and writing version 2 blocks.
- In version 1 all blocks are always compressed or uncompressed, as specified by the
HFile's compression algorithm, with a type-specific magic record stored in the beginning of the compressed data (i.e. one needs to uncompress the compressed block to determine the block type). There is only a single compression algorithm setting for all blocks. Offset and size information from the block index are required to read a block.
- In version 2 a block is structured as follows:
- Magic record identifying the block type (8 bytes)
- Compressed block size, header not included (4 bytes)
- Uncompressed block size, header not included (4 bytes)
- The offset of the previous block of the same type (8 bytes). This is used to be able to navigate to the previous block without going to the block
- For minorVersions >=1, there is an additional 4 byte field bytesPerChecksum that records the number of bytes in a checksum chunk.
- For minorVersions >=1, there is a 4 byte value to store the size of data on disk (excluding the checksums)
- For minorVersions >=1, a series of 4 byte checksums, one each for the number of bytes specified by bytesPerChecksum. index.
- Compressed data (or uncompressed data if compression is disabled). The compression algorithm is the same for all the blocks in the
HFile, similarly to what was done in version 1.
The version 2 block representation in the block cache is the same as above, except that the data section is always uncompressed in the cache.
从上述文档可以看出,随着HBase版本的迭代,出现了两种格式的HFile(version1、version2),对应着两个格式的HFileBlock(version1、version2),这里仅考虑version2的情况,且minorVersions默认值为1,由此可得,HFileBlock version2格式如下图:
| magic record (8 bytes) |
| compressed block size (header not included, 8 bytes) |
| uncompressed block size (header not included, 4 bytes) |
| the offset of the previous block of the same type (8 bytes) |
| bytesPerChecksum (the number of bytes in a checksum chunk, 4 bytes) |
| data size on disk (excluding the checksums) |
| a series of 4 byte checksums (one each for the number of bytes specified by bytesPerChecksum) |
| compressed data (uncompressed data if compression is disabled) |
HFileBlock.Writer
Unified version 2 HFile block writer. The intended usage pattern is as follows:
从上述文档可以看出,官方建议HFileBlock.Writer的使用方式如下:
(1)构建HFileBlock.Writer实例,需要提供使用什么压缩算法、使用什么数据编码格式、是否包含MemstoreTS、HBase小版本号、校验和类型以及多少字节数据生成一个校验和;
(2)通过startWriting方法获取到一个可以用以写出数据的输出流;
(3)根据需要循环调用writeHeaderAndData方法将块数据写出至输出流,然后可以通过方法getHeaderAndData得到一个包含已写出块数据的字节数组;
(4)循环(2)、(3)写出更多的块数据。
类图
核心变量
private State state = State.INIT;
Writer state. Used to ensure the correct usage protocol.
用以标识当前Writer状态,在其它方法调用期间确保Writer状态的正确性(即某些方法调用时Writer应处于特定状态),Writer状态有以下三种类型:
private final Compression.Algorithm compressAlgo;
Compression algorithm for all blocks this instance writes.
用以标识当前Writer使用的压缩算法,共有以下几种类型:
private final HFileDataBlockEncoder dataBlockEncoder;
Data block encoder used for data blocks.
数据块使用的编码器。
private ByteArrayOutputStream baosInMemory;
The stream we use to accumulate data in uncompressed format for each block. We reset this stream at the end of each block and reuse it. The header is written as the first
HFileBlock.headerSize(int) bytes into this stream.
该输出流用以积聚每个块的非压缩格式数据(每个块的数据由多个KeyValue组成),当一个块数据写出完毕后,通过方法reset重置此输出流,以使多个块之间可以重用该输出流。
private Compressor compressor;
private CompressionOutputStream compressionStream;
private ByteArrayOutputStream compressedByteStream;
如果使用压缩,将会涉及到上面三个变量:
compressor:Compressor, which is also reused between consecutive blocks.
compressionStream:Compression output stream.
compressedByteStream:Underlying stream to write compressed bytes to.
其中,compressionStream通过封装compressor、compressedByteStream而成。
private BlockType blockType;
Current block type. Set in startWriting(BlockType). Could be changed in encodeDataBlockForDisk() from BlockType.DATA to BlockType.ENCODED_DATA.
用以标识当前数据块的类型。
private DataOutputStream userDataStream;
A stream that we write uncompressed bytes to, which compresses them and writes them to baosInMemory.
userDataStream通过封装baosInMemory而成。
private byte[] onDiskBytesWithHeader;
Bytes to be written to the file system, including the header. Compressed if compression is turned on. It also includes the checksum data that immediately follows the block data. (header + data + checksums)
写入文件系统的最终字节数据,可能是压缩后的字节数据,同时包含头和检验和数据。
private int onDiskDataSizeWithHeader;
The size of the data on disk that does not include the checksums. (header + data)
写入文件系统的最终字节数据大小,不包含校验和数据。
private byte[] onDiskChecksum;
The size of the checksum data on disk. It is used only if data is not compressed. If data is compressed, then the checksums are already part of onDiskBytesWithHeader. If data is uncompressed, then this variable stores the checksum data for this block.
校验和数据,它仅仅被使用在数据没有被压缩的场景下。如果数据被压缩,则校验和数据包含在onDiskBytesWithHeader中;如果数据没有被压缩,onDiskChecksum存储着当前数据块的校验和数据。
private byte[] uncompressedBytesWithHeader;
Valid in the READY state. Contains the header and the uncompressed (but potentially encoded, if this is a data block) bytes, so the length is uncompressedSizeWithoutHeader + HFileBlock.headerSize(int). Does not store checksums.
包含头和未压缩的数据(数据可能已经被编码),不包含检验和数据。
private long startOffset;
Current block's start offset in the HFile. Set in writeHeaderAndData(FSDataOutputStream).
当前数据块在HFile中的起始偏移量。
private long[] prevOffsetByType;
Offset of previous block by block type. Updated when the next block is started.
private long prevOffset;
The offset of the previous block of the same type.
用以记录和当前块类型相同的前一个数据块在HFile中的起始偏移量,其中prevOffsetByType中记录着所有通过Writer写出的数据块类型的起始偏移量,并在开始下一个数据块时更新值。
private boolean includesMemstoreTS;
Whether we are including memstore timestamp after every key/value.
用以标识是否在每个KeyValue键值数据后加入MemstoreTS。
private ChecksumType checksumType;
校验和类型,包含以下几种:
private int bytesPerChecksum;
用以标识多少字节数据需要形成一个校验和。
private final int minorVersion;
用以标识HBase小版本号。
HFileBlock.Writer实例创建(构造函数)过程
this.minorVersion = minorVersion; // 默认值为HFileReaderV2.MAX_MINOR_VERSION(1)
compressAlgo = compressionAlgorithm == null ? NONE : compressionAlgorithm; // 默认值为Compression.Algorithm.NONE
this.dataBlockEncoder = dataBlockEncoder != null ? dataBlockEncoder : NoOpDataBlockEncoder.INSTANCE; // 默认值为NoOpDataBlockEncoder.INSTANCE,该实例实际上不对数据进行任务加工。
baosInMemory = new ByteArrayOutputStream();
if (compressAlgo != NONE) {
compressor = compressionAlgorithm.getCompressor();
compressedByteStream = new ByteArrayOutputStream();
try {
compressionStream = compressionAlgorithm.createPlainCompressionStream(compressedByteStream, compressor);
} catch (IOException e) {
throw new RuntimeException("Could not create compression stream " + "for algorithm " + compressionAlgorithm, e);
}
}
创建字节输出流,如果使用压缩,则需要创建相应的压缩输出流,无论压缩与否,底层均包装ByteArrayOutputStream,由此可以看出,HFileBlock.Writer并没有将数据直接写入文件系统,而且仅仅缓存在了内存中的一个字节数组中,外部程序拿到这个字节数组(会经过若干步处理,如编码、压缩)后再将数据写入文件系统。
if (minorVersion > MINOR_VERSION_NO_CHECKSUM && bytesPerChecksum < HEADER_SIZE_WITH_CHECKSUMS) {
throw new RuntimeException("Unsupported value of bytesPerChecksum. " +
" Minimum is " + HEADER_SIZE_WITH_CHECKSUMS + " but the configured value is " +
bytesPerChecksum);
}
校验bytesPerChecksum。
prevOffsetByType = new long[BlockType.values().length];
for (int i = 0; i < prevOffsetByType.length; ++i)
prevOffsetByType[i] = -1;
初始化数组prevOffsetByType,数组长度为块类型的个数。
this.includesMemstoreTS = includesMemstoreTS; // 默认值为true
this.checksumType = checksumType; // 默认值为ChecksumType.CRC32
this.bytesPerChecksum = bytesPerChecksum; // 默认值为16 * 1024
典型处理流程及源码分析
1. startWriting
方法签名:public DataOutputStream startWriting(BlockType newBlockType) throws IOException
方法描述:Starts writing into the block. The previous block's data is discarded.准备写入块数据,前一个块的数据将被丢弃。
触发条件:HFileWriterV2.newBlock()
执行流程:
if (state == State.BLOCK_READY && startOffset != -1) {
// We had a previous block that was written to a stream at a specific
// offset. Save that offset as the last offset of a block of that type.
prevOffsetByType[blockType.getId()] = startOffset;
}
根据数据块类型,记录前一个数据块在HFile中的起始偏移量,仅当前一个数据块写入完成(state为BLOCK_READY)才会执行此操作。
startOffset = -1;
blockType = newBlockType;
重置起始偏移量及数据块类型。
baosInMemory.reset();
baosInMemory.write(getDummyHeaderForVersion(this.minorVersion));
重置字节数组输出流,并写入“假”的头部信息,即一个空的字节数组,数据长度为24。
state = State.WRITING;
更新Writer状态为WRITING。
// We will compress it later in finishBlock()
userDataStream = new DataOutputStream(baosInMemory);
使用DataOutputStream“装饰”baosInMemory,方便外部数据写入。
return userDataStream;
返回供外部使用的数据输出流。
2. 写入KeyValue
// Write length of key and value and then actual key and value bytes.
// Additionally, we may also write down the memstoreTS.
{
DataOutputStream out = fsBlockWriter.getUserDataStream();
out.writeInt(klength);
totalKeyLength += klength;
out.writeInt(vlength);
totalValueLength += vlength;
out.write(key, koffset, klength);
out.write(value, voffset, vlength);
if (this.includeMemstoreTS) {
WritableUtils.writeVLong(out, memstoreTS);
}
}
上述代码实际由HFileWriterV2.append完成,在此不作详细解释。可以看出,通过HFileBlock.Writer getUserDataStream获取DataOutputStream实例out,然后即可通过out将数据写入。
3. writeHeaderAndData
方法签名:public void writeHeaderAndData(FSDataOutputStream out) throws IOException
方法描述:Similar to writeHeaderAndData(FSDataOutputStream), but records the offset of this block so that it can be referenced in the next block of the same type.主要作用是记录当前块的起始偏移量,以便于操作同一类型的下一个数据块时使用,其它操作转交由重载方法writeHeaderAndData(FSDataOutputStream)负责。总体职责就是将积聚的块数据写入指定的输出流即HDFS。
触发条件:当数据块的大小超过配额限制(默认64KB)时(通过方法blockSizeWritten,即userDataStream.size()得出),间接触发该方法执行。
执行流程:
long offset = out.getPos();
if (startOffset != -1 && offset != startOffset) {
throw new IOException("A " + blockType + " block written to a "
+ "stream twice, first at offset " + startOffset + ", then at "
+ offset);
}
startOffset = offset;
记录前一个数据块的起始偏移量。
writeHeaderAndData((DataOutputStream) out);
执行流程转交给重载方法writeHeaderAndData(FSDataOutputStream),执行流程如下:
ensureBlockReady();
Transitions the block writer from the "writing" state to the "block ready" state. Does nothing if a block is already finished.
将Writer的状态从“writing”转移至“block ready”,实际过程由finishBlock完成。
out.write(onDiskBytesWithHeader);
写出头部数据、数据,如果使用压缩则包含校验和数据。
if (compressAlgo == NONE && minorVersion > MINOR_VERSION_NO_CHECKSUM) {
if (onDiskChecksum == HConstants.EMPTY_BYTE_ARRAY) {
throw new IOException("A " + blockType
+ " without compression should have checksums "
+ " stored separately.");
}
out.write(onDiskChecksum);
}
如果没有使用压缩,写出校验和数据。
4. finishBlock
方法签名:private void finishBlock() throws IOException
方法描述:An internal method that flushes the compressing stream (if using compression), serializes the header, and takes care of the separate uncompressed stream for caching on write, if applicable. Sets block write state to "block ready".
触发条件:由方法ensureBlockReady调用触发
执行流程:
userDataStream.flush();
刷出输出流数据。
uncompressedBytesWithHeader = baosInMemory.toByteArray();
数据拷贝,将字节数组输出流中的数据保存至一字节数组中。
prevOffset = prevOffsetByType[blockType.getId()];
保存与当前块类型相同的前一个块的起始偏移量。
state = State.BLOCK_READY;
更新Writer状态为BLOCK_READY,注意执行此操作时数据尚未编码或压缩。
encodeDataBlockForDisk();
编码数据,默认编码器实例为NoOpDataBlockEncoder,并不会对数据产生任何编码操作。
doCompressionAndChecksumming();
压缩数据并生成校验和,执行流程如下:
private void doCompressionAndChecksumming() throws IOException {
if ( minorVersion <= MINOR_VERSION_NO_CHECKSUM) {
version20compression();
} else {
version21ChecksumAndCompression();
}
}
minorVersion值默认为1,流程跳转至version21ChecksumAndCompression,代码如下:
// do the compression
if (compressAlgo != NONE) {
// 使用压缩
// 重置压缩输出流
compressedByteStream.reset();
// 压缩输出流中写入“假”的头部数据
compressedByteStream.write(DUMMY_HEADER_WITH_CHECKSUM);
// 重置压缩输出流状态,compressionStream“装饰”compressedByteStream
compressionStream.resetState();
// 将uncompressedBytesWithHeader中的数据部分(不包含头)写入压缩输出流
compressionStream.write(uncompressedBytesWithHeader, headerSize(this.minorVersion),
uncompressedBytesWithHeader.length - headerSize(this.minorVersion));
// 刷出并完成压缩输出流
compressionStream.flush();
compressionStream.finish();
// generate checksums
// 其实就是在压缩输出流的末尾写入“假”的校验和数据用以“占位”,为后期的计算校验和数据留出空间
onDiskDataSizeWithHeader = compressedByteStream.size(); // data size
// reserve space for checksums in the output byte stream
ChecksumUtil.reserveSpaceForChecksums(compressedByteStream,
onDiskDataSizeWithHeader, bytesPerChecksum);
// “假”头部数据 + 数据(已被压缩)+ “假”校验和数据
onDiskBytesWithHeader = compressedByteStream.toByteArray();
// 计算“真实”头部数据
put21Header(onDiskBytesWithHeader, 0, onDiskBytesWithHeader.length,
uncompressedBytesWithHeader.length, onDiskDataSizeWithHeader);
// generate checksums for header and data. The checksums are
// part of onDiskBytesWithHeader itself.
// 计算“真实”校验和数据
ChecksumUtil.generateChecksums(
onDiskBytesWithHeader, 0, onDiskDataSizeWithHeader,
onDiskBytesWithHeader, onDiskDataSizeWithHeader,
checksumType, bytesPerChecksum);
// Checksums are already part of onDiskBytesWithHeader
onDiskChecksum = HConstants.EMPTY_BYTE_ARRAY;
// 为cache-on-write生成头部数据,uncompressedBytesWithHeader的数据部分是没有被压缩的
//set the header for the uncompressed bytes (for cache-on-write)
put21Header(uncompressedBytesWithHeader, 0,
onDiskBytesWithHeader.length + onDiskChecksum.length,
uncompressedBytesWithHeader.length, onDiskDataSizeWithHeader);
} else {
// 没有使用压缩
// If we are not using any compression, then the
// checksums are written to its own array onDiskChecksum.
onDiskBytesWithHeader = uncompressedBytesWithHeader;
onDiskDataSizeWithHeader = onDiskBytesWithHeader.length;
// 计算校验和长度
int numBytes = (int)ChecksumUtil.numBytes(
uncompressedBytesWithHeader.length,
bytesPerChecksum);
onDiskChecksum = new byte[numBytes];
// 计算头部数据
//set the header for the uncompressed bytes
put21Header(uncompressedBytesWithHeader, 0,
onDiskBytesWithHeader.length + onDiskChecksum.length,
uncompressedBytesWithHeader.length, onDiskDataSizeWithHeader);
// 计算校验和数据
ChecksumUtil.generateChecksums(
uncompressedBytesWithHeader, 0, uncompressedBytesWithHeader.length,
onDiskChecksum, 0,
checksumType, bytesPerChecksum);
}
代码到此,如果没有使用压缩,onDiskBytesWithHeader中包含头部数据+数据,onDiskChecksum为相应的校验和数据;如果使用压缩,onDiskBytesWithHeader包含头部数据+压缩数据+校验和数据,onDiskChecksum为EMPTY_BYTE_ARRAY。第3步writeHeaderAndData即可据此将块数据(onDiskBytesWithHeader、onDiskChecksum)写出。
循环以上几步,即可完成HFile Block的写流程。
- HBase BlockCache
1. Cache 读写 调用逻辑: hmaster.handleCreateTable->HRegion.createHRegion-> HRegion. initialize-> ...
- HBase之CF持久化系列(续3——完结篇)
相信大家在看了该系列的前两篇文章就已经对其中的持久化有比较深入的了解.相对而言,本节内容只是对前两节的一个巩固.与持久化相对应的是打开文件并将其内容读入到内存变量中.而在本节,我就来介绍这一点. 本节 ...
- HBase之CF持久化系列(续2)
正如上篇博文所说,在本节我将为大家带来StoreFlusher.finalizeWriter..如果大家没有看过我的上篇博文<HBase之CF持久化系列(续1)>,那我希望大家还是回去看一 ...
- HBase之CF持久化系列(续1)
这一节本来打算讲解HRegion的初始化过程中一些比较复杂的流程.不过,考虑前面的博文做的铺垫并不够,因此,在这一节,我还是特意来介绍HBase的CF持久化.关于这个话题的整体流程性分析在博文< ...
- <HBase><读写><LSM>
Overview HBase中的一个big table,首先会按行划分成一些region(这些region之间是有序的,由startkey保证),每个region分配到不同的节点进行存储.因此,reg ...
- hbase源码系列(十三)缓存机制MemStore与Block Cache
这一章讲hbase的缓存机制,这里面涉及的内容也是比较多,呵呵,我理解中的缓存是保存在内存中的特定的便于检索的数据结构就是缓存. 之前在讲put的时候,put是被添加到Store里面,这个Store是 ...
- hbase源码系列(十二)Get、Scan在服务端是如何处理?
继上一篇讲了Put和Delete之后,这一篇我们讲Get和Scan, 因为我发现这两个操作几乎是一样的过程,就像之前的Put和Delete一样,上一篇我本来只打算写Put的,结果发现Delete也可以 ...
- hbase源码系列(九)StoreFile存储格式
从这一章开始要讲Region Server这块的了,但是在讲Region Server这块之前得讲一下StoreFile,否则后面的不好讲下去,这块是基础,Region Sever上面的操作,大部分都 ...
- HBase – 存储文件HFile结构解析
本文由 网易云发布. 作者:范欣欣 本篇文章仅限内部分享,如需转载,请联系网易获取授权. HFile是HBase存储数据的文件组织形式,参考BigTable的SSTable和Hadoop的TFile ...
随机推荐
- 在 foreach 里使用引用要注意的陷阱(转)
从一道面试题开始 在开始本节内容前,我们先来看看一道还算比较常见的PHP面试题: 1 $arr = array('1','2','3'); 2 3 foreach($arr as &$v) ...
- gcc -I -L -l区别
我们用gcc编译程序时,可能会用到“-I”(大写i),“-L”(大写l),“-l”(小写l)等参数,下面做个记录: 例: gcc -o hello hello.c -I /home/hello/inc ...
- Android AppWidget(转)
AppWidget不知道大家使用这个多不多,这个在手机上也叫做挂件,挂件也就是放在桌面方便用户进行使用的,从android1.6开始挂件支持一些简单的lauout和view,到了android4.0之 ...
- 自定义控件(视图)2期笔记08:自定义控件之 9patch图说明
1. 何为 9patch图 ? 它是一个对png图片做处理的一个工具,能够为我们生成一个"*.9.png"的图片:所谓"*.9.png"这是Androi ...
- .net处理页面的抓取数据
//要抓取数据的页面路径 string url = "http://www.scedu.net/banshi/used-car/lower-secondary-education/middl ...
- 安装PHP过程中,make步骤报错:(集合网络上各种解决方法)
安装PHP过程中,make步骤报错:(集合网络上各种解决方法) (1)-liconv -o sapi/fpm/php-fpm /usr/bin/ld: cannot find -liconv coll ...
- tableView代理方法的调用时间,(主要是heightForRowAtIndexPath和cellForRowAtIndexPath调用时间)
最近做一个demo,涉及按照数据分类然后依照分类在 cellForRowAtIndexPath形成不同类型的cell,就是有判断(在viewdidload里面做)和形成(在 cellForRowAtI ...
- 六、C# 派生
派生 对一个现有的类型进行扩展,以便添加更多的功能,或者对现有的类型的操作进行重写. 比如可以将两个类都适用的方法和属性,用一个新的类进行重构,两个类再分别继承这个类. 定义一个派生类时,要在 ...
- 33.Spring整合Struts2.md
[toc] 1.搭建环境 在idea下可以在创建module时候选择,注意WEB-INF下的classes和lib两个目录需要手动创建,并且对应的配置文件和依赖的lib需要手动拷贝到这两个文件夹下 2 ...
- 【USACO 2.4.5】分数化小数
[描述] 写一个程序,输入一个形如N/D的分数(N是分子,D是分母),输出它的小数形式. 如果小数有循环节的话,把循环节放在一对圆括号中. 例如, 1/3 =0.33333333 写成0.(3), 4 ...