kafka 和 rocketMQ 的数据存储

kafka 版本：1.1.1

一个分区对应一个文件夹，数据以 segment 文件存储，segment 默认 1G。

分区文件夹：

segment 文件：

segment 的命名规则是怎样的？

kafka roll segment 的逻辑：kafka.log.Log#roll

  /**

   * Roll the log over to a new active segment starting with the current logEndOffset.

   * This will trim the index to the exact size of the number of entries it currently contains.

   *

   * @return The newly rolled segment

   */

  def roll(expectedNextOffset: Option[Long] = None): LogSegment = {

    maybeHandleIOException(s"Error while rolling log segment for $topicPartition in dir ${dir.getParent}") {

      val start = time.hiResClockMs()

      lock synchronized {

        checkIfMemoryMappedBufferClosed()

        val newOffset = math.max(expectedNextOffset.getOrElse(0L), logEndOffset)

        // 00000000000030898257.log 文件

        val logFile = Log.logFile(dir, newOffset)

        if (segments.containsKey(newOffset)) {

          // segment with the same base offset already exists and loaded

          if (activeSegment.baseOffset == newOffset && activeSegment.size == 0) {

            // We have seen this happen (see KAFKA-6388) after shouldRoll() returns true for an

            // active segment of size zero because of one of the indexes is "full" (due to _maxEntries == 0).

            warn(s"Trying to roll a new log segment with start offset $newOffset " +

                 s"=max(provided offset = $expectedNextOffset, LEO = $logEndOffset) while it already " +

                 s"exists and is active with size 0. Size of time index: ${activeSegment.timeIndex.entries}," +

                 s" size of offset index: ${activeSegment.offsetIndex.entries}.")

            deleteSegment(activeSegment)

          } else {

            throw new KafkaException(s"Trying to roll a new log segment for topic partition $topicPartition with start offset $newOffset" +

                                     s" =max(provided offset = $expectedNextOffset, LEO = $logEndOffset) while it already exists. Existing " +

                                     s"segment is ${segments.get(newOffset)}.")

          }

        } else if (!segments.isEmpty && newOffset < activeSegment.baseOffset) {

          throw new KafkaException(

            s"Trying to roll a new log segment for topic partition $topicPartition with " +

            s"start offset $newOffset =max(provided offset = $expectedNextOffset, LEO = $logEndOffset) lower than start offset of the active segment $activeSegment")

        } else {

          val offsetIdxFile = offsetIndexFile(dir, newOffset)

          val timeIdxFile = timeIndexFile(dir, newOffset)

          val txnIdxFile = transactionIndexFile(dir, newOffset)

          for (file <- List(logFile, offsetIdxFile, timeIdxFile, txnIdxFile) if file.exists) {

            warn(s"Newly rolled segment file ${file.getAbsolutePath} already exists; deleting it first")

            Files.delete(file.toPath)

          }

          Option(segments.lastEntry).foreach(_.getValue.onBecomeInactiveSegment())

        }

        // take a snapshot of the producer state to facilitate recovery. It is useful to have the snapshot

        // offset align with the new segment offset since this ensures we can recover the segment by beginning

        // with the corresponding snapshot file and scanning the segment data. Because the segment base offset

        // may actually be ahead of the current producer state end offset (which corresponds to the log end offset),

        // we manually override the state offset here prior to taking the snapshot.

        producerStateManager.updateMapEndOffset(newOffset)

        producerStateManager.takeSnapshot()

        val segment = LogSegment.open(dir,

          baseOffset = newOffset,

          config,

          time = time,

          fileAlreadyExists = false,

          initFileSize = initFileSize,

          preallocate = config.preallocate)

        addSegment(segment)

        // We need to update the segment base offset and append position data of the metadata when log rolls.

        // The next offset should not change.

        updateLogEndOffset(nextOffsetMetadata.messageOffset)

        // schedule an asynchronous flush of the old segment

        scheduler.schedule("flush-log", () => flush(newOffset), delay = 0L)

        info(s"Rolled new log segment at offset $newOffset in ${time.hiResClockMs() - start} ms.")

        segment

      }

    }

  }

可以看到，segment 使用当前 logEndOffset 作为文件名。即 segment 文件用第一条消息的 offset 作文件名。

还有一个和 log 文件同名的 index 文件，index 文件内容是 offset/position，一个 entry 包含 2 个 int，一共 8 字节。

kafka.log.OffsetIndex#append

  /**

   * Append an entry for the given offset/location pair to the index. This entry must have a larger offset than all subsequent entries.

   */

  def append(offset: Long, position: Int) {

    inLock(lock) {

      require(!isFull, "Attempt to append to a full index (size = " + _entries + ").")

      if (_entries == 0 || offset > _lastOffset) {

        trace(s"Adding index entry $offset => $position to ${file.getAbsolutePath}")

        // 相对偏移量

        mmap.putInt((offset - baseOffset).toInt)

        // 消息在 log 文件中的物理地址

        mmap.putInt(position)

        _entries += 1

        _lastOffset = offset

        require(_entries * entrySize == mmap.position(), entries + " entries but file position in index is " + mmap.position() + ".")

      } else {

        throw new InvalidOffsetException(s"Attempt to append an offset ($offset) to position $entries no larger than" +

          s" the last offset appended (${_lastOffset}) to ${file.getAbsolutePath}.")

      }

    }

  }

盗图一张：

http://rocketmq.cloud/zh-cn/docs/design-store.html

而 rocketMQ 的存储与 kafka 不同，分为 commitlog 和 consumequeue：

所有 topic 的消息存储在 commitlog 文件中，commitlog 默认按 1G 分段，文件名按物理偏移量命名。

而索引信息保存在 consumequeue/topic/queue 目录下，一个 entry 固定 20 字节，分别为 8 字节的 commitlog 物理偏移量、4 字节的消息长度、8 字节 tag hashcode。

从代码推出 commitLog 和 consumeQueue 的文件存储格式。

默认文件大小

// org.apache.rocketmq.store.config.MessageStoreConfig

// CommitLog file size, default is 1G

private int mapedFileSizeCommitLog = 1024 * 1024 * 1024;

// ConsumeQueue file size, default is 30W, 大小有 6M

private int mapedFileSizeConsumeQueue = 300000 * ConsumeQueue.CQ_STORE_UNIT_SIZE;

从这个方法可以清晰地看出 commitLog 的存储格式

// org.apache.rocketmq.store.CommitLog#calMsgLength

private static int calMsgLength(int bodyLength, int topicLength, int propertiesLength) {

    final int msgLen = 4 //TOTALSIZE

        + 4 //MAGICCODE

        + 4 //BODYCRC

        + 4 //QUEUEID

        + 4 //FLAG

        + 8 //QUEUEOFFSET

        + 8 //PHYSICALOFFSET

        + 4 //SYSFLAG

        + 8 //BORNTIMESTAMP

        + 8 //BORNHOST

        + 8 //STORETIMESTAMP

        + 8 //STOREHOSTADDRESS

        + 4 //RECONSUMETIMES

        + 8 //Prepared Transaction Offset

        + 4 + (bodyLength > 0 ? bodyLength : 0) //BODY

        + 1 + topicLength //TOPIC

        + 2 + (propertiesLength > 0 ? propertiesLength : 0) //propertiesLength

        + 0;

    return msgLen;

}

当使用分区 offset 拉取消息时，consumeQueue 类似于 index，一个 entry 20 字节，包括 commitLog offset，消息 size，tag 的 hashcode，对于延时消息，tag 字段存的是超时时间。

boolean result = this.putMessagePositionInfo(request.getCommitLogOffset(), request.getMsgSize(), tagsCode, request.getConsumeQueueOffset());

// org.apache.rocketmq.store.ConsumeQueue#putMessagePositionInfo

private boolean putMessagePositionInfo(final long offset, final int size, final long tagsCode, final long cqOffset) {

    if (offset <= this.maxPhysicOffset) {

        return true;

    }

    this.byteBufferIndex.flip();

    this.byteBufferIndex.limit(CQ_STORE_UNIT_SIZE);

    // 8 + 4 + 8 = 20

    this.byteBufferIndex.putLong(offset); // commitLog 的物理位置

    this.byteBufferIndex.putInt(size); // 消息大小

    this.byteBufferIndex.putLong(tagsCode); // 8 字节 tag 哈希值

    ...

}

broker 为消息的 UNIQ_KEY 和 topic + "#" + key 建立索引，index 文件的结构本质上是一个 hashmap

// org.apache.rocketmq.store.index.IndexFile

// 40 + 5000000*4 + 20000000*20

int fileTotalSize = IndexHeader.INDEX_HEADER_SIZE + (hashSlotNum * hashSlotSize) + (indexNum * indexSize);

// 一个索引文件大概 420M, 写满了则创建新文件

索引文件就是一个 hashmap，根据 key 查询消息时，遍历所有的 indexFile

文件结构：

文件头
哈希槽
数据部分

// org.apache.rocketmq.store.index.IndexFile#putKey

// 数据 entry 的大小为 20 字节：keyHash, phyOffset, timeDiff, slotValue

this.mappedByteBuffer.putInt(absIndexPos, keyHash);

this.mappedByteBuffer.putLong(absIndexPos + 4, phyOffset);

this.mappedByteBuffer.putInt(absIndexPos + 4 + 8, (int) timeDiff);

// 这里的 slotValue 是上一条索引的编号

this.mappedByteBuffer.putInt(absIndexPos + 4 + 8 + 4, slotValue);

// 当前索引的编号写到哈希槽

this.mappedByteBuffer.putInt(absSlotPos, this.indexHeader.getIndexCount());

rocketMQ 写完 commitLog 后，写 consumeQueue 和 indexFile 是一个异步的过程，在

org.apache.rocketmq.store.DefaultMessageStore.ReputMessageService#doReput

中触发

// org.apache.rocketmq.store.DefaultMessageStore#DefaultMessageStore

this.dispatcherList = new LinkedList<>();

this.dispatcherList.addLast(new CommitLogDispatcherBuildConsumeQueue());

this.dispatcherList.addLast(new CommitLogDispatcherBuildIndex());

// org.apache.rocketmq.store.DefaultMessageStore#doDispatch

public void doDispatch(DispatchRequest req) {

    for (CommitLogDispatcher dispatcher : this.dispatcherList) {

        dispatcher.dispatch(req);

    }

}

kafka 和 rocketMQ 的数据存储的更多相关文章

解决KafKa数据存储与顺序一致性保证
“严格的顺序消费”有多么困难下面就从3个方面来分析一下,对于一个消息中间件来说,”严格的顺序消费”有多么困难,或者说不可能. 发送端发送端不能异步发送,异步发送在发送失败的情况下,就没办法保证消息 ...
MQ初窥门径【面试必看的Kafka和RocketMQ存储区别】
MQ初窥门径全称(message queue)消息队列,一个用于接收消息.存储消息并转发消息的中间件应用场景用于解决的场景,总之是能接收消息并转发消息用于异步处理,比如A服务做了什么事情,异步 ...
kafka如何实现高并发存储-如何找到一条需要消费的数据(阿里)
阿里太注重原理了:阿里问kafka如何实现高并发存储-如何找到一条需要消费的数据,kafka用了稀疏索引的方式,使用了二分查找法,其实很多索引都是二分查找法二分查找法的时间复杂度:O(logn) ...
Kafka与RocketMq文件存储机制对比
一个商业化消息队列的性能好坏,其文件存储机制设计是衡量一个消息队列服务技术水平和最关键指标之一. 开头问题 kafka文件结构和rocketMQ文件结构是什么样子?特点是什么? 一.目录结构 Kafk ...
kafka 数据存储结构+原理+基本操作命令
数据存储结构: Kafka中的Message是以topic为基本单位组织的,不同的topic之间是相互独立的.每个topic又可以分成几个不同的partition(每个topic有几个partitio ...
Spark Streaming接收Kafka数据存储到Hbase
Spark Streaming接收Kafka数据存储到Hbase fly spark hbase kafka 主要参考了这篇文章https://yq.aliyun.com/articles/60712 ...
Kafka session.timeout.ms heartbeat.interval.ms参数的区别以及对数据存储的一些思考
Kafka session.timeout.ms heartbeat.interval.ms参数的区别以及对数据存储的一些思考在计算机世界中经常需要与数据打交道,这也是我们戏称CURD工程师的原因之 ...
Kafka vs RocketMQ——单机系统可靠性-转自阿里中间件
引言前几期的评测中,我们对比了Kafka和RocketMQ的吞吐量和稳定性,本期我们要引入一个新的评测标准--软件可靠性. 何为"可靠性"? 先看下面这种情况:有A,B两辆越野汽 ...
Kafka vs RocketMQ——单机系统可靠性
引言前几期的评测中,我们对比了Kafka和RocketMQ的吞吐量和稳定性,本期我们要引入一个新的评测标准——软件可靠性. 何为“可靠性”? 先看下面这种情况:有A,B两辆越野汽车,在城市的周边地区 ...

随机推荐

MinGW的安装
我在MinGW官网下载到的版本是mingw-w64-install.exe,不过这差不多是一年以前的事了…… 安装路径:D:\Program Files (x86)\mingw-w64\i686- ...
wordpress各个文件作用详解
1.index.php:wordpress核心索引文件,即博客输出文件. 2.license.txt:WordPress GPL许可证文件. 3.my-hacks.php:定义了博客输出之前处理的追加 ...
memset，内存初始化函数
# include <string.h> void *memset(void *s, int c, unsigned long n); 函数的功能是:将指针变量 s 所指向的前 n 字节的 ...
关于Mysql select语句中拼接字符串的记录
在mysql的SELECT语句中拼接两列(或多列)的字符串显示: mysql> select concat(dname,loc) from dept; 以上语句便把dept表的dname,loc ...
python面向对象--类的装饰器
# def deco(obj): # print("=====",obj) # obj.x=1 # return obj # @deco#===> test = deco(t ...
[转]走近0day
首先,需要大家端正一下学习态度-也就是对于破解的态度.每一个有一定修为的软件破解者,也就是CRACKER,都很清楚,我们破解掉软件的序列号,功能限制,时间限制等等东西都不是最终的目的,一个真正的CRA ...
vector存放结构体数据的2种方法
如果要在Vector容器中存放结构体类型的变量,经常见到两种存放方式. 方式一:放入这个结构体类型变量的副本. 方式二:放入指向这个结构体类型变量的指针. 假设结构体类型变量是这样的, typedef ...
visual studio中配置opencv
第1步附加包含目录:H:\software\programming\opencv\opencv\build\include 第2步附加库目录:H:\software\programming\openc ...
windows 10 安装openssh 0x800f0954 的一种解决方法
安装与卸载参考:https://docs.microsoft.com/zh-cn/windows-server/administration/openssh/openssh_install_first ...
kubernetes之三使用kubectl在k8s上部署应用
在上一篇中,我们学习了使用minikube来搭建k8s集群.k8s集群启动后,就可以在上面部署应用了.本篇,我们就来学习如何使用kubectl在k8s上部署应用. 学习之前,可以先从下面这篇博客上了解 ...

kafka 和 rocketMQ 的数据存储

kafka 和 rocketMQ 的数据存储的更多相关文章

随机推荐

热门专题