google三驾马车之一:Bigtable解读(英文版)
本文重点关注了系统设计相关的内容,paper后半部分的具体应用此处没有过多涉及。从个人笔记修改而来,因此为英文版本。
Bigtable: A Distributed Storage System for Structured Data
Data model: not a relational data model
A Bigtable is a sparse, distributed, persistent multidimensional sorted map. —— part2
How the map indexed?
(row:string, column:string, time:int64) → string
just like json format, eg:
table{
// ...
"aaaaa" : { //row
"A:foo" : { //col
15 : "y", //timestamp
4 : "m"
},
"A:bar" : { //col
15 : "d",
},
"B:" : { //col
6 : "w"
3 : "o"
1 : "w"
}
},
// ...
}
a particular table: webtable
- row(also called tablet): reversed URL
concurrent: single row key is atomic
lexicographic order - col: column families, contents
family:qualifier
Access control and both disk and memory accounting - timestamp
avoid collisions: unique timestamp, decreasing order
garbage-collection mechanism(eg.)
API
C++ read/write
MapReduce + Bigtable
Building Block
Google File System: store log and data files
distributed Google File System
Google SSTable file format: store Bigtable data
K-V map: iterate key/value pairs in a specified key range
- a sequence of blocks
- a block index
disk seek or memory seek?
Optionally, SSTable can be completely mapped into memory, which allows us to perform lookups and scans without touching disk.
Chubby: distributed lock service
5 active replicas: 1 master, 4 slave
Paxos algorithm: to keep its replicas consistent in the face of failure
namespace: including directory and small file, op r/w is atomic
session: when expires, lose locks and open handles
Implementation
consist:
- library(?) linked to every client
- 1 master server(schedule, garbage-collect......)
- many tablet server(10-1000 tablets)
As with many single-master distributed storage systems, client data does not move through the master: clients communicate directly with tablet servers for reads and writes.
hierarchy (B+-tree)
Chubby file -> Root tablet -> other METADATA tablets -> UserTables
METADATA: many other things stored in it
Master: schedule & manage
Each tablet is assigned to one tablet server at a time. Bigtable uses Chubby to keep track of tablet servers. When a tablet server starts, it creates, and acquires an exclusive lock on, a uniquely-named file in a specific Chubby directory. The master monitors this directory (the servers directory) to discover tablet servers.
The essential point for distributed database: lock
The Bigtable is only a series of ops, real data is stored in GFS.(SSTable)
Tablet Representation
memtable: the recently committed updates are stored in memory in a sorted buffer
reconstruct: redo points in commit logs
Compactions
As write operations execute, the size of the memtable increases. When the memtable size reaches a threshold, the memtable is frozen, a new memtable is created, and the frozen memtable is converted to an SSTable and written to GFS.
minor(memtable) -> major(SSTable) compaction
Refinement
locality group
Clients can group multiple column families together into a locality group. A separate SSTable is generated for each locality group in each tablet.
This section describes portions of the implementation in more detail in order to highlight these refinements.
in-memory locality groups are loaded lazily
storage: compression
read performance: caching
Bloom filters
commit-log
Speeding up tablet recovery
Exploiting immutability
Performance Evaluation
Lesson
- large distributed systems are vulnerable to many types of failures
- it is important to delay adding new features until it is clear how the new features will be used
- the importance of proper system-level monitoring
- the value of simple designs
google三驾马车之一:Bigtable解读(英文版)的更多相关文章
- 分布式系统漫谈一 —— Google三驾马车: GFS,mapreduce,Bigtable
分布式系统学习必读文章!!!! 原文:http://blog.sina.com.cn/s/blog_4ed630e801000bi3.html 分布式系统漫谈一 —— Google三驾马车: GFS, ...
- [MapReduce] Google三驾马车:GFS、MapReduce和Bigtable
声明:此文转载自博客开发团队的博客,尊重原创工作.该文适合学分布式系统之前,作为背景介绍来读. 谈到分布式系统,就不得不提Google的三驾马车:Google FS[1],MapReduce[2],B ...
- Google三驾马车:GFS、MapReduce和Bigtable
谈到分布式系统,就不得不提Google的三驾马车:Google fs[1],Mapreduce[2],Bigtable[3]. 虽然Google没有公布这三个产品的源码,但是他发布了这三个产品的详细设 ...
- Google三驾马车
Google旧三驾马车: GFS,mapreduce,Bigtable http://blog.sina.com.cn/s/blog_4ed630e801000bi3.html Google新三驾马车 ...
- 【技术与商业案例解读笔记】095:Google大数据三驾马车笔记
1.谷歌三驾马车地位 [关键词]开启时代,指明方向 聊起大数据,我们通常言必称谷歌,谷歌有“三驾马车”:谷歌文件系统(GFS).MapReduce和BigTable.谷歌的“三驾马车”开启了大数据时 ...
- Childlife旗下三驾马车
Childlife旗下,尤其以 “提高免疫力”为口号的“三驾马车”:第一防御液.VC.紫雏菊,是相当热门的海淘产品.据说这是一系列“成分天然.有效治愈感冒提升免疫力.由美国著名儿科医生研发”的药物.
- Ubuntu 安装 k8s 三驾马车 kubelet kubeadm kubectl
Ubuntu 版本是 18.04 ,用的是阿里云服务器,记录一下自己实际安装过程的操作步骤. 安装 docker 安装所需的软件 apt-get update apt-get install -y a ...
- Qt 学习笔记 - 第三章 - Qt的三驾马车之一 - 串口编程 + 程序打包成Windows软件
Qt 学习笔记全系列传送门: Qt 学习笔记 - 第一章 - 快速开始.信号与槽 Qt 学习笔记 - 第二章 - 添加图片.布局.界面切换 [本章]Qt 学习笔记 - 第三章 - Qt的三驾马车之一 ...
- 更强、更稳、更高效:解读 etcd 技术升级的三驾马车
点击下载<不一样的 双11 技术:阿里巴巴经济体云原生实践> 本文节选自<不一样的 双11 技术:阿里巴巴经济体云原生实践>一书,点击上方图片即可下载! 作者 | 陈星宇(宇慕 ...
- itemKNN发展史----推荐系统的三篇重要的论文解读
itemKNN发展史----推荐系统的三篇重要的论文解读 本文用到的符号标识 1.Item-based CF 基本过程: 计算相似度矩阵 Cosine相似度 皮尔逊相似系数 参数聚合进行推荐 根据用户 ...
随机推荐
- shell脚本(9)-流程控制for
一.循环介绍 for循环叫做条件循环,或者for i in,可以通过for实现流程控制 二.for语法 1.for语法一:for in for var in value1 value2 ...... ...
- spring-transaction源码分析(3)Transactional事务失效原因
问题概述 在Transactional方法中使用this方式调用另一个Transactional方法时,拦截器无法拦截到被调用方法,严重时会使事务失效. 类似以下代码: @Transactional ...
- 01-module/分频器/激励写法
1.module module有出入接口,输出接口 module有时钟和复位 // input clock; rest_n; // n表示低电平复位 //output o_data; module m ...
- CDC设计实例-02
CDC设计实例 加速器 假设要处理一项业务比如图像处理,有两种方向,第一种选择一些通用的处理器CPU\GPU\DSP等通用的处理器,第二种是将算法映射成IP,直接使用IP进行处理图像处理等专门的业务就 ...
- 【FreeRTOS】内核查找最高优先级就绪任务
查找最高优先级就绪任务 FreeRTOS\Source\tasks.c #if ( configUSE_PORT_OPTIMISED_TASK_SELECTION == 0 ) /* If confi ...
- [转帖]从小白到精通:揭秘perf工具的全部功能与操作技巧
https://zhuanlan.zhihu.com/p/664396453 目录 收起 一.引言 二.理解perf工具的基本概念 三.安装与配置perf工具 3.1.不同操作系统的perf工具安 ...
- [转帖]RabbitMQ学习笔记:内置Prometheus支持rabbit_prometheus插件
从3.8.0开始,RabbitMQ提供内置的Prometheus和Grafana支持. rabbitmq_prometheus插件中提供了对Prometheus指标收集的支持.该插件以Promethe ...
- [转帖]火狐URL默认打开为HTTPS,切换成http形式
火狐在当前及未来版本默认URL采用HTTPS进行链接,但个人习惯,某些网站不是https,改http在响应超时状态也会切成https,将默认为http. edge,chrome 依然还是http为主要 ...
- [转帖]KingbaseES wal(xlog) 日志清理故障恢复案例
https://www.cnblogs.com/kingbase/p/16266365.html 案例说明:在通过sys_archivecleanup工具手工清理wal日志时,在control文件中查 ...
- 【转帖】高性能异步io机制:io_uring
文章目录 1.性能测试 1.1.FIO 1.2.rust_echo_benc 2.io_uring 2.1.io_uring_setup 2.2.io_uring_enter 2.3.io_uring ...