本文重点关注了系统设计相关的内容,paper后半部分的具体应用此处没有过多涉及。从个人笔记修改而来,因此为英文版本。

Bigtable: A Distributed Storage System for Structured Data

Data model: not a relational data model

A Bigtable is a sparse, distributed, persistent multidimensional sorted map. —— part2

How the map indexed?

(row:string, column:string, time:int64) → string

just like json format, eg:

table{
// ...
"aaaaa" : { //row
"A:foo" : { //col
15 : "y", //timestamp
4 : "m"
},
"A:bar" : { //col
15 : "d",
},
"B:" : { //col
6 : "w"
3 : "o"
1 : "w"
}
},
// ...
}

a particular table: webtable

  • row(also called tablet): reversed URL

    concurrent: single row key is atomic

    lexicographic order

  • col: column families, contents

    family:qualifier

    Access control and both disk and memory accounting

  • timestamp

    avoid collisions: unique timestamp, decreasing order

    garbage-collection mechanism(eg.)


API

C++ read/write

MapReduce + Bigtable


Building Block

Google File System: store log and data files

distributed Google File System

Google SSTable file format: store Bigtable data

K-V map: iterate key/value pairs in a specified key range

  • a sequence of blocks
  • a block index
disk seek or memory seek?

Optionally, SSTable can be completely mapped into memory, which allows us to perform lookups and scans without touching disk.

Chubby: distributed lock service

5 active replicas: 1 master, 4 slave

Paxos algorithm: to keep its replicas consistent in the face of failure

namespace: including directory and small file, op r/w is atomic

session: when expires, lose locks and open handles


Implementation

consist:

  1. library(?) linked to every client
  2. 1 master server(schedule, garbage-collect......)
  3. many tablet server(10-1000 tablets)

As with many single-master distributed storage systems, client data does not move through the master: clients communicate directly with tablet servers for reads and writes.

hierarchy (B+-tree)

Chubby file -> Root tablet -> other METADATA tablets -> UserTables

METADATA: many other things stored in it

Master: schedule & manage

Each tablet is assigned to one tablet server at a time. Bigtable uses Chubby to keep track of tablet servers. When a tablet server starts, it creates, and acquires an exclusive lock on, a uniquely-named file in a specific Chubby directory. The master monitors this directory (the servers directory) to discover tablet servers.

The essential point for distributed database: lock

The Bigtable is only a series of ops, real data is stored in GFS.(SSTable)

Tablet Representation

memtable: the recently committed updates are stored in memory in a sorted buffer

reconstruct: redo points in commit logs

Compactions

As write operations execute, the size of the memtable increases. When the memtable size reaches a threshold, the memtable is frozen, a new memtable is created, and the frozen memtable is converted to an SSTable and written to GFS.

minor(memtable) -> major(SSTable) compaction


Refinement

locality group

Clients can group multiple column families together into a locality group. A separate SSTable is generated for each locality group in each tablet.

This section describes portions of the implementation in more detail in order to highlight these refinements.

in-memory locality groups are loaded lazily

storage: compression

read performance: caching

Bloom filters

commit-log

Speeding up tablet recovery

Exploiting immutability


Performance Evaluation


Lesson

  1. large distributed systems are vulnerable to many types of failures
  2. it is important to delay adding new features until it is clear how the new features will be used
  3. the importance of proper system-level monitoring
  4. the value of simple designs

google三驾马车之一:Bigtable解读(英文版)的更多相关文章

  1. 分布式系统漫谈一 —— Google三驾马车: GFS,mapreduce,Bigtable

    分布式系统学习必读文章!!!! 原文:http://blog.sina.com.cn/s/blog_4ed630e801000bi3.html 分布式系统漫谈一 —— Google三驾马车: GFS, ...

  2. [MapReduce] Google三驾马车:GFS、MapReduce和Bigtable

    声明:此文转载自博客开发团队的博客,尊重原创工作.该文适合学分布式系统之前,作为背景介绍来读. 谈到分布式系统,就不得不提Google的三驾马车:Google FS[1],MapReduce[2],B ...

  3. Google三驾马车:GFS、MapReduce和Bigtable

    谈到分布式系统,就不得不提Google的三驾马车:Google fs[1],Mapreduce[2],Bigtable[3]. 虽然Google没有公布这三个产品的源码,但是他发布了这三个产品的详细设 ...

  4. Google三驾马车

    Google旧三驾马车: GFS,mapreduce,Bigtable http://blog.sina.com.cn/s/blog_4ed630e801000bi3.html Google新三驾马车 ...

  5. 【技术与商业案例解读笔记】095:Google大数据三驾马车笔记

     1.谷歌三驾马车地位 [关键词]开启时代,指明方向 聊起大数据,我们通常言必称谷歌,谷歌有“三驾马车”:谷歌文件系统(GFS).MapReduce和BigTable.谷歌的“三驾马车”开启了大数据时 ...

  6. Childlife旗下三驾马车

    Childlife旗下,尤其以 “提高免疫力”为口号的“三驾马车”:第一防御液.VC.紫雏菊,是相当热门的海淘产品.据说这是一系列“成分天然.有效治愈感冒提升免疫力.由美国著名儿科医生研发”的药物.

  7. Ubuntu 安装 k8s 三驾马车 kubelet kubeadm kubectl

    Ubuntu 版本是 18.04 ,用的是阿里云服务器,记录一下自己实际安装过程的操作步骤. 安装 docker 安装所需的软件 apt-get update apt-get install -y a ...

  8. Qt 学习笔记 - 第三章 - Qt的三驾马车之一 - 串口编程 + 程序打包成Windows软件

    Qt 学习笔记全系列传送门: Qt 学习笔记 - 第一章 - 快速开始.信号与槽 Qt 学习笔记 - 第二章 - 添加图片.布局.界面切换 [本章]Qt 学习笔记 - 第三章 - Qt的三驾马车之一 ...

  9. 更强、更稳、更高效:解读 etcd 技术升级的三驾马车

    点击下载<不一样的 双11 技术:阿里巴巴经济体云原生实践> 本文节选自<不一样的 双11 技术:阿里巴巴经济体云原生实践>一书,点击上方图片即可下载! 作者 | 陈星宇(宇慕 ...

  10. itemKNN发展史----推荐系统的三篇重要的论文解读

    itemKNN发展史----推荐系统的三篇重要的论文解读 本文用到的符号标识 1.Item-based CF 基本过程: 计算相似度矩阵 Cosine相似度 皮尔逊相似系数 参数聚合进行推荐 根据用户 ...

随机推荐

  1. vmware超融合基础安装与配置

    目录 vmware超融合 安装配置ESXI 安装VMware vCenter Server 安装vCenter插件 安装vCenter 使用VMware Vsphere Client登录Vcenter ...

  2. 解决 idea maven plugins 报红波浪线

    导入新项目到 idea 的时候,由于依赖的环境以及项目中 maven 编译所依赖的pom的版本不同,很多时候导入到idea的时候 maven Plugins 会出现报红的情况,这是由于maven仓库中 ...

  3. Git-历史版本切换-log-reset

  4. process-exporter 监控linux机器进程使用情况

    process-exporter 监控linux机器进程使用情况 背景 前期一直想进行 关于 IP地址的来源和目的地的监控 但是耗费了很多精力都没有搞定. 感觉应该去偷师一下安全监控软件的使用方式. ...

  5. [转帖]查看mysql分区名和各分区数据量

    – 查看mysql分区名和各分区数据量 SELECT table_name, partition_name, table_rows FROM information_schema.PARTITIONS ...

  6. [转帖]浅析TiDB二阶段提交

    https://cloud.tencent.com/developer/article/1608073 关键内容说明: TiDB 对于每个事务,会涉及改动的所有key中,选择出一个作为当前事务的Pri ...

  7. [转帖]vdbench

    https://www.cnblogs.com/AgainstTheWind/p/9869513.html 一.vdbench安装1.安装java:java -version(vdbench的运行依赖 ...

  8. [转帖]Intel固态硬盘总结

    https://www.cnblogs.com/hongdada/p/17326247.html 2012年推出的S3700,采用的是25nm闪存颗粒. 2015年推出s3710,采用的是20nm闪存 ...

  9. [转帖]A17再次证明苹果才是王者,组装芯片的安卓手机给它提鞋都不配

    http://news.sohu.com/a/653472711_121124371 在挤了两代牙膏之后,苹果终于拿出了性能大幅提升的A17处理器,外媒传出A17处理器的性能提升幅度至少超过四成,相比 ...

  10. Redis scan等命令的学习与研究

    Redis scan等命令的学习与研究 摘要 背景跟前几天说的一个问题类似. 为了验证自己的设想, 所以晚上继续写脚本进行了一轮次的验证. 不过上次讨论时,打击好像都没听懂我说的 所以这次准备从基础开 ...