本文重点关注了系统设计相关的内容,paper后半部分的具体应用此处没有过多涉及。从个人笔记修改而来,因此为英文版本。

Bigtable: A Distributed Storage System for Structured Data

Data model: not a relational data model

A Bigtable is a sparse, distributed, persistent multidimensional sorted map. —— part2

How the map indexed?

(row:string, column:string, time:int64) → string

just like json format, eg:

table{
// ...
"aaaaa" : { //row
"A:foo" : { //col
15 : "y", //timestamp
4 : "m"
},
"A:bar" : { //col
15 : "d",
},
"B:" : { //col
6 : "w"
3 : "o"
1 : "w"
}
},
// ...
}

a particular table: webtable

  • row(also called tablet): reversed URL

    concurrent: single row key is atomic

    lexicographic order

  • col: column families, contents

    family:qualifier

    Access control and both disk and memory accounting

  • timestamp

    avoid collisions: unique timestamp, decreasing order

    garbage-collection mechanism(eg.)


API

C++ read/write

MapReduce + Bigtable


Building Block

Google File System: store log and data files

distributed Google File System

Google SSTable file format: store Bigtable data

K-V map: iterate key/value pairs in a specified key range

  • a sequence of blocks
  • a block index
disk seek or memory seek?

Optionally, SSTable can be completely mapped into memory, which allows us to perform lookups and scans without touching disk.

Chubby: distributed lock service

5 active replicas: 1 master, 4 slave

Paxos algorithm: to keep its replicas consistent in the face of failure

namespace: including directory and small file, op r/w is atomic

session: when expires, lose locks and open handles


Implementation

consist:

  1. library(?) linked to every client
  2. 1 master server(schedule, garbage-collect......)
  3. many tablet server(10-1000 tablets)

As with many single-master distributed storage systems, client data does not move through the master: clients communicate directly with tablet servers for reads and writes.

hierarchy (B+-tree)

Chubby file -> Root tablet -> other METADATA tablets -> UserTables

METADATA: many other things stored in it

Master: schedule & manage

Each tablet is assigned to one tablet server at a time. Bigtable uses Chubby to keep track of tablet servers. When a tablet server starts, it creates, and acquires an exclusive lock on, a uniquely-named file in a specific Chubby directory. The master monitors this directory (the servers directory) to discover tablet servers.

The essential point for distributed database: lock

The Bigtable is only a series of ops, real data is stored in GFS.(SSTable)

Tablet Representation

memtable: the recently committed updates are stored in memory in a sorted buffer

reconstruct: redo points in commit logs

Compactions

As write operations execute, the size of the memtable increases. When the memtable size reaches a threshold, the memtable is frozen, a new memtable is created, and the frozen memtable is converted to an SSTable and written to GFS.

minor(memtable) -> major(SSTable) compaction


Refinement

locality group

Clients can group multiple column families together into a locality group. A separate SSTable is generated for each locality group in each tablet.

This section describes portions of the implementation in more detail in order to highlight these refinements.

in-memory locality groups are loaded lazily

storage: compression

read performance: caching

Bloom filters

commit-log

Speeding up tablet recovery

Exploiting immutability


Performance Evaluation


Lesson

  1. large distributed systems are vulnerable to many types of failures
  2. it is important to delay adding new features until it is clear how the new features will be used
  3. the importance of proper system-level monitoring
  4. the value of simple designs

google三驾马车之一:Bigtable解读(英文版)的更多相关文章

  1. 分布式系统漫谈一 —— Google三驾马车: GFS,mapreduce,Bigtable

    分布式系统学习必读文章!!!! 原文:http://blog.sina.com.cn/s/blog_4ed630e801000bi3.html 分布式系统漫谈一 —— Google三驾马车: GFS, ...

  2. [MapReduce] Google三驾马车:GFS、MapReduce和Bigtable

    声明:此文转载自博客开发团队的博客,尊重原创工作.该文适合学分布式系统之前,作为背景介绍来读. 谈到分布式系统,就不得不提Google的三驾马车:Google FS[1],MapReduce[2],B ...

  3. Google三驾马车:GFS、MapReduce和Bigtable

    谈到分布式系统,就不得不提Google的三驾马车:Google fs[1],Mapreduce[2],Bigtable[3]. 虽然Google没有公布这三个产品的源码,但是他发布了这三个产品的详细设 ...

  4. Google三驾马车

    Google旧三驾马车: GFS,mapreduce,Bigtable http://blog.sina.com.cn/s/blog_4ed630e801000bi3.html Google新三驾马车 ...

  5. 【技术与商业案例解读笔记】095:Google大数据三驾马车笔记

     1.谷歌三驾马车地位 [关键词]开启时代,指明方向 聊起大数据,我们通常言必称谷歌,谷歌有“三驾马车”:谷歌文件系统(GFS).MapReduce和BigTable.谷歌的“三驾马车”开启了大数据时 ...

  6. Childlife旗下三驾马车

    Childlife旗下,尤其以 “提高免疫力”为口号的“三驾马车”:第一防御液.VC.紫雏菊,是相当热门的海淘产品.据说这是一系列“成分天然.有效治愈感冒提升免疫力.由美国著名儿科医生研发”的药物.

  7. Ubuntu 安装 k8s 三驾马车 kubelet kubeadm kubectl

    Ubuntu 版本是 18.04 ,用的是阿里云服务器,记录一下自己实际安装过程的操作步骤. 安装 docker 安装所需的软件 apt-get update apt-get install -y a ...

  8. Qt 学习笔记 - 第三章 - Qt的三驾马车之一 - 串口编程 + 程序打包成Windows软件

    Qt 学习笔记全系列传送门: Qt 学习笔记 - 第一章 - 快速开始.信号与槽 Qt 学习笔记 - 第二章 - 添加图片.布局.界面切换 [本章]Qt 学习笔记 - 第三章 - Qt的三驾马车之一 ...

  9. 更强、更稳、更高效:解读 etcd 技术升级的三驾马车

    点击下载<不一样的 双11 技术:阿里巴巴经济体云原生实践> 本文节选自<不一样的 双11 技术:阿里巴巴经济体云原生实践>一书,点击上方图片即可下载! 作者 | 陈星宇(宇慕 ...

  10. itemKNN发展史----推荐系统的三篇重要的论文解读

    itemKNN发展史----推荐系统的三篇重要的论文解读 本文用到的符号标识 1.Item-based CF 基本过程: 计算相似度矩阵 Cosine相似度 皮尔逊相似系数 参数聚合进行推荐 根据用户 ...

随机推荐

  1. MetaGPT day02: MetaGPT Role源码分析

    MetaGPT源码分析 思维导图 MetaGPT版本为v0.4.0,如下是from metagpt.roles import Role,Role类执行Role.run时的思维导图: 概述 其中最重要的 ...

  2. 机器学习-决策树系列-Adaboost算法-集成学习-29

    目录 1. adaboost算法的基本思想 2. 具体实现 1. adaboost算法的基本思想 集成学习是将多个弱模型集成在一起 变成一个强模型 提高模型的准确率,一般有如下两种: bagging: ...

  3. CF1656F Parametric MST 题解

    为了便于解题,先对 \(a\) 数组从小到大进行排序. 首先,根据定义可以得出总价值的表达式: \[\begin{aligned} W&=\sum\limits_{(u,v)\in E}[a_ ...

  4. Jupyter Notebook报错'500 : Internal Server Error'的解决方法

    问题根因 Jupyter相关的软件包版本匹配存在问题,或者历史上安装过Jupyter相关的配套软件但是有残留.大部分网上的博客都是推荐用pip重装jupyter或者nbconvert,亲测无法解决该问 ...

  5. 05-Shell索引数组变量

    1.介绍 Shell 支持数组(Array),数组是若干数据的集合,其中的每一份数据都称为数组的元素. 注意Bash Shell 只支持一维数组,不支持多维数组. 2.数组的定义 2.1 语法 在 S ...

  6. 类外static函数定义要不要加static关键字?

    类外static函数定义要不要加static关键字? 先说答案:不需要. 错误代码: #include<iostream> #include<memory> using nam ...

  7. C#操作 excel 表格

    nuget引入: EPPlus.Core FileInfo file = new FileInfo(@"d:\test.xlsx"); using (ExcelPackage pa ...

  8. Mygin 实现简单Http

    本篇是完全参考gin的功能,自己手动实现一个类似的功能,帮助自己理解和学习gin框架 目的 简单介绍net/http库以及http.Handler接口 实现简单的功能 标准库启动Web服务 impor ...

  9. [转帖]如何使用coredump

    一.coredump 当用户态进程出现异常后,在该进程的执行目录下生成对应的coredump文件,如果我们想将coredump生成的位置做改变,就需要如下设置. echo "/home/co ...

  10. [转帖]Skip List--跳表(全网最详细的跳表文章没有之一)

    https://www.jianshu.com/p/9d8296562806 跳表是一种神奇的数据结构,因为几乎所有版本的大学本科教材上都没有跳表这种数据结构,而且神书<算法导论>.< ...