本文重点关注了系统设计相关的内容,paper后半部分的具体应用此处没有过多涉及。从个人笔记修改而来,因此为英文版本。

Bigtable: A Distributed Storage System for Structured Data

Data model: not a relational data model

A Bigtable is a sparse, distributed, persistent multidimensional sorted map. —— part2

How the map indexed?

(row:string, column:string, time:int64) → string

just like json format, eg:

table{
// ...
"aaaaa" : { //row
"A:foo" : { //col
15 : "y", //timestamp
4 : "m"
},
"A:bar" : { //col
15 : "d",
},
"B:" : { //col
6 : "w"
3 : "o"
1 : "w"
}
},
// ...
}

a particular table: webtable

  • row(also called tablet): reversed URL

    concurrent: single row key is atomic

    lexicographic order

  • col: column families, contents

    family:qualifier

    Access control and both disk and memory accounting

  • timestamp

    avoid collisions: unique timestamp, decreasing order

    garbage-collection mechanism(eg.)


API

C++ read/write

MapReduce + Bigtable


Building Block

Google File System: store log and data files

distributed Google File System

Google SSTable file format: store Bigtable data

K-V map: iterate key/value pairs in a specified key range

  • a sequence of blocks
  • a block index
disk seek or memory seek?

Optionally, SSTable can be completely mapped into memory, which allows us to perform lookups and scans without touching disk.

Chubby: distributed lock service

5 active replicas: 1 master, 4 slave

Paxos algorithm: to keep its replicas consistent in the face of failure

namespace: including directory and small file, op r/w is atomic

session: when expires, lose locks and open handles


Implementation

consist:

  1. library(?) linked to every client
  2. 1 master server(schedule, garbage-collect......)
  3. many tablet server(10-1000 tablets)

As with many single-master distributed storage systems, client data does not move through the master: clients communicate directly with tablet servers for reads and writes.

hierarchy (B+-tree)

Chubby file -> Root tablet -> other METADATA tablets -> UserTables

METADATA: many other things stored in it

Master: schedule & manage

Each tablet is assigned to one tablet server at a time. Bigtable uses Chubby to keep track of tablet servers. When a tablet server starts, it creates, and acquires an exclusive lock on, a uniquely-named file in a specific Chubby directory. The master monitors this directory (the servers directory) to discover tablet servers.

The essential point for distributed database: lock

The Bigtable is only a series of ops, real data is stored in GFS.(SSTable)

Tablet Representation

memtable: the recently committed updates are stored in memory in a sorted buffer

reconstruct: redo points in commit logs

Compactions

As write operations execute, the size of the memtable increases. When the memtable size reaches a threshold, the memtable is frozen, a new memtable is created, and the frozen memtable is converted to an SSTable and written to GFS.

minor(memtable) -> major(SSTable) compaction


Refinement

locality group

Clients can group multiple column families together into a locality group. A separate SSTable is generated for each locality group in each tablet.

This section describes portions of the implementation in more detail in order to highlight these refinements.

in-memory locality groups are loaded lazily

storage: compression

read performance: caching

Bloom filters

commit-log

Speeding up tablet recovery

Exploiting immutability


Performance Evaluation


Lesson

  1. large distributed systems are vulnerable to many types of failures
  2. it is important to delay adding new features until it is clear how the new features will be used
  3. the importance of proper system-level monitoring
  4. the value of simple designs

google三驾马车之一:Bigtable解读(英文版)的更多相关文章

  1. 分布式系统漫谈一 —— Google三驾马车: GFS,mapreduce,Bigtable

    分布式系统学习必读文章!!!! 原文:http://blog.sina.com.cn/s/blog_4ed630e801000bi3.html 分布式系统漫谈一 —— Google三驾马车: GFS, ...

  2. [MapReduce] Google三驾马车:GFS、MapReduce和Bigtable

    声明:此文转载自博客开发团队的博客,尊重原创工作.该文适合学分布式系统之前,作为背景介绍来读. 谈到分布式系统,就不得不提Google的三驾马车:Google FS[1],MapReduce[2],B ...

  3. Google三驾马车:GFS、MapReduce和Bigtable

    谈到分布式系统,就不得不提Google的三驾马车:Google fs[1],Mapreduce[2],Bigtable[3]. 虽然Google没有公布这三个产品的源码,但是他发布了这三个产品的详细设 ...

  4. Google三驾马车

    Google旧三驾马车: GFS,mapreduce,Bigtable http://blog.sina.com.cn/s/blog_4ed630e801000bi3.html Google新三驾马车 ...

  5. 【技术与商业案例解读笔记】095:Google大数据三驾马车笔记

     1.谷歌三驾马车地位 [关键词]开启时代,指明方向 聊起大数据,我们通常言必称谷歌,谷歌有“三驾马车”:谷歌文件系统(GFS).MapReduce和BigTable.谷歌的“三驾马车”开启了大数据时 ...

  6. Childlife旗下三驾马车

    Childlife旗下,尤其以 “提高免疫力”为口号的“三驾马车”:第一防御液.VC.紫雏菊,是相当热门的海淘产品.据说这是一系列“成分天然.有效治愈感冒提升免疫力.由美国著名儿科医生研发”的药物.

  7. Ubuntu 安装 k8s 三驾马车 kubelet kubeadm kubectl

    Ubuntu 版本是 18.04 ,用的是阿里云服务器,记录一下自己实际安装过程的操作步骤. 安装 docker 安装所需的软件 apt-get update apt-get install -y a ...

  8. Qt 学习笔记 - 第三章 - Qt的三驾马车之一 - 串口编程 + 程序打包成Windows软件

    Qt 学习笔记全系列传送门: Qt 学习笔记 - 第一章 - 快速开始.信号与槽 Qt 学习笔记 - 第二章 - 添加图片.布局.界面切换 [本章]Qt 学习笔记 - 第三章 - Qt的三驾马车之一 ...

  9. 更强、更稳、更高效:解读 etcd 技术升级的三驾马车

    点击下载<不一样的 双11 技术:阿里巴巴经济体云原生实践> 本文节选自<不一样的 双11 技术:阿里巴巴经济体云原生实践>一书,点击上方图片即可下载! 作者 | 陈星宇(宇慕 ...

  10. itemKNN发展史----推荐系统的三篇重要的论文解读

    itemKNN发展史----推荐系统的三篇重要的论文解读 本文用到的符号标识 1.Item-based CF 基本过程: 计算相似度矩阵 Cosine相似度 皮尔逊相似系数 参数聚合进行推荐 根据用户 ...

随机推荐

  1. shell脚本(9)-流程控制for

    一.循环介绍 for循环叫做条件循环,或者for i in,可以通过for实现流程控制 二.for语法 1.for语法一:for in for var in value1 value2 ...... ...

  2. spring-transaction源码分析(3)Transactional事务失效原因

    问题概述 在Transactional方法中使用this方式调用另一个Transactional方法时,拦截器无法拦截到被调用方法,严重时会使事务失效. 类似以下代码: @Transactional ...

  3. 01-module/分频器/激励写法

    1.module module有出入接口,输出接口 module有时钟和复位 // input clock; rest_n; // n表示低电平复位 //output o_data; module m ...

  4. CDC设计实例-02

    CDC设计实例 加速器 假设要处理一项业务比如图像处理,有两种方向,第一种选择一些通用的处理器CPU\GPU\DSP等通用的处理器,第二种是将算法映射成IP,直接使用IP进行处理图像处理等专门的业务就 ...

  5. 【FreeRTOS】内核查找最高优先级就绪任务

    查找最高优先级就绪任务 FreeRTOS\Source\tasks.c #if ( configUSE_PORT_OPTIMISED_TASK_SELECTION == 0 ) /* If confi ...

  6. [转帖]从小白到精通:揭秘perf工具的全部功能与操作技巧

    https://zhuanlan.zhihu.com/p/664396453 ​ 目录 收起 一.引言 二.理解perf工具的基本概念 三.安装与配置perf工具 3.1.不同操作系统的perf工具安 ...

  7. [转帖]RabbitMQ学习笔记:内置Prometheus支持rabbit_prometheus插件

    从3.8.0开始,RabbitMQ提供内置的Prometheus和Grafana支持. rabbitmq_prometheus插件中提供了对Prometheus指标收集的支持.该插件以Promethe ...

  8. [转帖]火狐URL默认打开为HTTPS,切换成http形式

    火狐在当前及未来版本默认URL采用HTTPS进行链接,但个人习惯,某些网站不是https,改http在响应超时状态也会切成https,将默认为http. edge,chrome 依然还是http为主要 ...

  9. [转帖]KingbaseES wal(xlog) 日志清理故障恢复案例

    https://www.cnblogs.com/kingbase/p/16266365.html 案例说明:在通过sys_archivecleanup工具手工清理wal日志时,在control文件中查 ...

  10. 【转帖】高性能异步io机制:io_uring

    文章目录 1.性能测试 1.1.FIO 1.2.rust_echo_benc 2.io_uring 2.1.io_uring_setup 2.2.io_uring_enter 2.3.io_uring ...