转自: http://www.linkedin.com/pulse/nosql-cassandra-hbase-rocksdb-siddharth-anand

I've had the pleasure of working in Data Infrastructure for more than 12 years now  at companies like eBay, Etsy, Netflix, and LinkedIn. If you are unfamiliar with the term Data Infrastructure, the term refers to a collection of services that receive, persist, transfer, and provide custom access to data. This includes databases, caches, messaging systems, data streams (for ingress, replication, and egress), ETL, data warehouses, search engines, blob & object stores, filers, graph engines, and other specialty data systems.

Every popular web site invests in building data infrastructure that is performant, scalable, and available. Though the extent of these investments may vary, the problems these companies face often do not. These companies must invariably answer the same class of questions : How do I write a lot of data relatively quickly, access the data in flexible and interesting ways (again, at scale and with low latency), and allow for changes to my system without any interruption to service. In some cases, companies must also defend against forces beyond their control (e.g. power outages in their hosting providers) while providing uninterrupted service to end users.

One technology trend that emerged about 5 years ago and reached a level of maturity (at least in Silicon Valley) is NoSQL. NoSQL is really a counter-culture term that signaled a revolutionary approach to database problems. Instead of trying to solve all or most problems in a single database, the NoSQL movement decided to solve a small set of problems well. NoSQL databases did not pretend to be the inevitable replacement for their general-purpose RDBMS counterparts, but they did provide a better solution for certain use-cases. Akin to the ASIC (application-specific integrated circuits) revolution of 2 decades earlier, NoSQL databases came to live alongside their RDBMS counterparts in a data center, just as ASICs came to live alongside general purpose CPUs on a motherboard.

However, there are still those who are confused by NoSQL DBs (or more-aptly named ASDBs). They try to reconcile giving up their RDBMS installations for the narrower-functioning NoSQL DBs or they get confused when comparing 2 NoSQL technologies for the same use-case, when each was clearly designed for different purposes.

I'd like to clear up this confusion.

Firstly, it is important to recognize that the NoSQL revolution was sparked by 2 seminal papers : Amazon's Dynamo DB & Google's Big Table. These 2 papers solve very different problems. The former (Dynamo) attempted to build a distributed, fault-tolerant OLTP store with a very simple data schema (key-value). Dynamo's goal was to reliably carry out shopping-cart actions related to persistence and lookup in face of data center failures. The latter (Big Table) was built for a company in which web crawlers were constantly inserting large amounts of data and in which search indexers were constantly building new indexes. Search indexes were built using Google's Map-Reduce system, so the Map phase needed an efficient way to scan data. Data was flowing in as a result of crawling and it was being read as a result of Map-related scans that were used to build search indexes or generate analytics.

From these papers came two of the most widely-adopted NoSQL Databases today: Cassandra and HBase. Cassandra leverages elements of the Dynamo distribution model and the Big-table data model. HBase is influenced by Big Table and hence relies of the GFS equivalent in the open-source realm : HDFS.

Cassandra & HBase : Cousins, not Siblings

A common question that I am asked is which one is better. This is the wrong question to ask. They are built for different purposes.

There are some similarities between HBase and Cassandra. Both leverage append-only semantics in data persistence in order to achieve high-speed data ingestion. This is used in writing both the commit logs and core data files (a.k.a. region files in HBase or sstables in Cassandra). Both leverage in-memory data structures to land the writes, choosing to flush the writes out as immutable region files or sstables in a background process. The region files and sstables are opened as append-only for writing. In both systems, reads must reconcile stale data with new data. Since read operations become more expensive as the number of region files or sstables grows, another background process is needed to consolidate or compact the files. Finally, both systems are written in Java.

The similarities end there. Whereas it may appear that these systems should have similar performance and operational profiles, they do not.

Cassandra for OLTP, HBase for Map-Reduce

The Difference SSDs Make

HBase is co-deployed with Hadoop distributions. This allows Map-Reduce jobs and HBase to share the same HDFS cluster. The hardware profile of the cluster is typically oriented towards the needs of map-reduce jobs : high data throughput workloads that don't require tight time SLAs. As such, the Hadoop software distribution is usually installed on a large but cheap fleet of commodity hardware running magnetic disks. Although, this might be reasonable for map-reduce, this is terrible for a database (a.k.a. HBase). Every commit log fsync and every region file flush incurs disk head scheduling penalties.

Cassandra deployments typically leverage SSDs. Cassandra is simply faster thanks to the use of SSDs

Why Scans Hurt OLTP Performance

HBase is range-partitioned. This makes sense since it was designed to be used by mappers in map-reduce. Mapper scans are very expensive when run in Java with generational garbage collection enabled. Scans are like snow-plows pushing objects out of eden and into the survivor or tenured spaces, even if those force-promoted objects would have died in young gen. This force promotion means that the more expensive old-gen collections are needed to clean our garbage. Though CMS is pretty good about limiting pause times during old gen collections, it doesn't compact, so fragmentation will eventually become an issue. When running mixed loads (OLTP and Map-reduce) on HBase, people often see increased OLTP response times caused by the mapper scan-elevated pause times.

Cassandra is mod-partitioned and does not support range scans on contiguous row keys. This makes map-reduce against a running Cassandra cluster less efficient than it would be on HBase (and not viable). However, companies like Netflix and Coursera carry out Map-reduce using the SSTable files directly, by-passing this problem. This project is called Aegisthus

The key take-away is to not use HBase for OLTP use-cases. On the flip side, HBase is better supported in the map-reduce eco-system than Apache Cassandra. Netflix's Aeghistus project is the only way to run map-reduce over Cassandra sstables.

RocksDB, Learning from HBase Issues

A couple of years ago, the same team at Facebook that spearheaded the HBase Messaging work unveiled RocksDB. What is RocksDB? It's HBase with the following differences :

  • Does not use HDFS
  • Simple Key-Value store
  • Runs on the local file system (typically with SSD)
  • No write distribution or multi-node reliability

In other words, it is similar to a single Cassandra node, except it is purely Key --> Value, instead of (rowKey, columnKey) --> Value

Document-oriented DBs

I would be remiss to exclude MongoDB as the other popular NoSQL DB. It is yet a 3rd type of NoSQL DB. It is document-oriented and hence supports nested data types. It has gained a lot of popularity but I have not seen it used at scale. LinkedIn has developed its own document-oriented DB, Espresso. Google similarly wrote a paper a few years ago on F1, its document-oriented DB.

Recap

NoSQL is not one type of DB. It's a term used to classify application-specific DBs. There are trade-offs among them by definition as they are not built to be general-purpose. Among these, Cassandra and MongoDB have found a place as replacements for OLTP RDBMS where businesses can work-around the trade-offs. HBase does not perform well as an OLTP replacement, but has found a niche in the world of map-reduce, where it has its roots.

NoSQL: Cassandra, HBase, RocksDB的更多相关文章

  1. sstable, bigtable,leveldb,cassandra,hbase的lsm基础

    先看懂文献1和2 1. 先了解sstable.SSTable: Sorted String Table [2] [10] WiscKey:  类似myisam, key value分离, 根据ssd优 ...

  2. 图解Nosql(hbase)与传统数据库的区别

    图解Nosql(hbase)与传统数据库的区别http://www.aboutyun.com/thread-7804-1-1.html(出处: about云开发) 问题导读:1.nosql数据库能否删 ...

  3. Cassandra HBase和MongoDb性能比较

    详见: http://blog.yemou.net/article/query/info/tytfjhfascvhzxcytp68这是一篇基于亚马逊云平台上对三个主流的NoSQL数据库性能比较,在读写 ...

  4. NoSql Cassandra

    我们为什么要使用NOSQL非关系数据库? 随着互联网web2.0网站的兴起,非关系型的数据库现在成了一个极其热门的新领域,非关系数据库产品的发展非常迅速.而传统的关系数据库在应付web2.0网站,特别 ...

  5. BigData NoSQL —— ApsaraDB HBase数据存储与分析平台概览

    一.引言 时间到了2019年,数据库也发展到了一个新的拐点,有三个明显的趋势: 越来越多的数据库会做云原生(CloudNative),会不断利用新的硬件及云本身的优势打造CloudNative数据库, ...

  6. NoSQL之HBase

    http://www.cnblogs.com/LBSer/p/3330383.html 9月初淘宝飞芃做了一个关于HBase的分享,讲的激情飞扬,让听众收益匪浅,现做下简单总结. HBase是一个No ...

  7. 从源代码剖析Mahout推荐引擎

    转载自:http://blog.fens.me/mahout-recommend-engine/ Hadoop家族系列文章,主要介绍Hadoop家族产品,常用的项目包括Hadoop, Hive, Pi ...

  8. 转】从源代码剖析Mahout推荐引擎

    原博文出自于: http://blog.fens.me/mahout-recommend-engine/ 感谢! 从源代码剖析Mahout推荐引擎 Hadoop家族系列文章,主要介绍Hadoop家族产 ...

  9. WiscKey: Separating Keys from Values in SSD-Conscious Storage [读后整理]

    WiscKey: Separating Keys from Values in SSD-Conscious Storage WiscKey是一个基于LSM的KV存储引擎,特点是:针对SSD的顺序和随机 ...

随机推荐

  1. visual c++ 中的stdafx.h头文件的作用

    stdafx.h VC工程里面经常见到stdafx.h这个头文件,以前也没有特别注意,但是这个文件用不好经常会出错,所以就GOOGLE了一下,总算是弄清楚了... stdafx的英文全称为:Stand ...

  2. C++析构函数的自动调用问题

    首先要明确一点,系统只会自动释放栈内空间,而堆内空间需要用户自己维护. C++中,除了new来的空间存放在堆内,其他均存放在栈中. 当单纯的创建对象的时候,对象存放在栈中,此时在程序块的}后面,系统会 ...

  3. DNS使用的是TCP协议还是UDP协议

    原文:http://benbenxiongyuan.iteye.com/blog/1088085 DNS同时占用UDP和TCP端口53是公认的,这种单个应用协议同时使用两种传输协议的情况在TCP/IP ...

  4. org.apache.cxf.interceptor.Fault: Unmarshalling Error: 意外的元素 (uri:"", local:"mixornot")。

    三月 09, 2018 3:09:14 下午 org.apache.cxf.phase.PhaseInterceptorChain doDefaultLogging警告: Interceptor fo ...

  5. VS文件发布不了,这样设置可以解决

    在VS里面新增一些文件的时候,往往发布的时候会发布不了,比如:(*.rdlc,*.p12).在项目里面,这些项目已经包含在项目里了,但是发布后,会发现这些文件并没有被发布出来 解决办法:邮件选择文件, ...

  6. mac安装protobuf2.4.1时报错./include/gtest/internal/gtest-port.h:428:10: fatal error: 'tr1/tuple' file not found和google/protobuf/message.cc:175:16: error: implicit instantiation of undefined template

    通过网上下载的protobuf2.4.1的压缩文件,然后进行安装,./configure和make时遇到了两个问题. 正常的安装步骤如下: ./configure make  make check m ...

  7. 判断UNITY版本号

    代码示例: #if (UNITY_5_3 || UNITY_5_4 || UNITY_5_5 || UNITY_5_6 || UNITY_5_7 || UNITY_5_8 || UNITY_5_9)u ...

  8. TZOJ 4813 机器翻译(模拟数组头和尾)

    描述 小晨的电脑上安装了一个机器翻译软件,他经常用这个软件来翻译英语文章. 这个翻译软件的原理很简单,它只是从头到尾,依次将每个英文单词用对应的中文含义来替换.对于每个英文单词,软件会先在内存中查找这 ...

  9. JavaScript的数据类型和运算符总结

    1.定义变量用关键字 var var a = 1 var b = "abc" 2.javascript脚本每一行要用分号隔开 3.javascript的代码一般放在html代码的最 ...

  10. python版本安装

    目的 本文目的在于,对于不熟悉Python的人,教你: 1. 从哪里找到 可以下载到 各种版本的 包括Python 2.x和Python 3.x的 最新版本的 Python. 高手请无视之. 2.以及 ...