druid相关的时间序列数据库——也用到了倒排相关的优化技术

Cattell [6] maintains a great summary about existing Scalable SQL and NoSQL data stores. Hu [18] contributed another great summary for streaming databases. Druid feature-wise sits some-

where between Google’s Dremel [28] and PowerDrill [17]. Druid has most of the features implemented in Dremel (Dremel handles arbitrary nested data structures while Druid only allows for a single

level of array-based nesting) and many of the interesting compression algorithms mentioned in PowerDrill. Although Druid builds on many of the same principles as other distributed columnar data stores [15], many of these data stores are

designed to be more generic key-value stores [23] and do not sup

port computation directly in the storage layer. There are also other

data stores designed for some of the same data warehousing issues

that Druid is meant to solve. These systems include in-memory

databases such as SAP’s HANA [14] and VoltDB [43]. These data

stores lack Druid’slowlatency ingestion characteristics. Druidalso

has native analytical features baked in, similar to ParAccel [34],

however, Druid allows system wide rolling software updates with

no downtime.

Druid is similiar to C-Store [38] and LazyBase [8] in that it has

twosubsystems,aread-optimizedsubsysteminthehistoricalnodes

andawrite-optimizedsubsysteminreal-timenodes. Real-timenodes

are designed to ingest a high volume of append heavy data, and do

not support data updates. Unlike the two aforementioned systems,

Druid is meant for OLAP transactions and not OLTP transactions.

Druid’s low latency data ingestion features share some similar-

ities with Trident/Storm [27] and Spark Streaming [45], however,

both systems are focused on stream processing whereas Druid is

focused on ingestion and aggregation. Stream processors are great

complements to Druid as a means of pre-processing the data before

the data enters Druid.

There are a class of systems that specialize in queries on top of

cluster computing frameworks. Shark [13] is such a system for

queriesontopofSpark,andCloudera’sImpala[9]isanothersystem

focused on optimizing query performance on top of HDFS. Druid

historical nodes download data locally and only work with native

Druid indexes. We believe this setup allows for faster query laten

cies.

Druid leverages a unique combination of algorithms in its archi-

tecture. Although we believe no other data store has the same set

of functionality as Druid, some of Druid’s optimization techniques

suchas using inverted indices to perform fast filter sarealsousedin

other data stores [26].

druid白皮书：http://static.druid.io/docs/druid.pdf

druid相关的时间序列数据库——也用到了倒排相关的优化技术的更多相关文章

时间序列数据库(TSDB)初识与选择(InfluxDB、OpenTSDB、Druid、Elasticsearch对比)
背景这两年互联网行业掀着一股新风,总是听着各种高大上的新名词.大数据.人工智能.物联网.机器学习.商业智能.智能预警啊等等. 以前的系统,做数据可视化,信息管理,流程控制.现在业务已经不仅仅满足于这 ...
时间序列数据库(TSDB)初识与选择
时间序列数据库(TSDB)初识与选择本文作者由 MageByte 团队的「借来方向」编写,关注公众号给你更多硬核技术背景这两年互联网行业掀着一股新风,总是听着各种高大上的新名词.大数据.人工 ...
OpenTSDB介绍——基于Hbase的分布式的，可伸缩的时间序列数据库，而Hbase本质是列存储
原文链接:http://www.jianshu.com/p/0bafd0168647 OpenTSDB介绍 1.1.OpenTSDB是什么?主要用途是什么? 官方文档这样描述:OpenTSDB is ...
时间序列数据库武斗大会之 KairosDB 篇
[编者按] 刘斌,OneAPM后端研发工程师,拥有10多年编程经验,参与过大型金融.通信以及Android手机操作系的开发,熟悉Linux及后台开发技术.曾参与翻译过<第一本Docker书> ...
时间序列数据库概览——基于文件（RRD）、K/V数据库（influxDB）、关系型数据库
一般人们谈论时间序列数据库的时候指代的就是这一类存储.按照底层技术不同可以划分为三类. 直接基于文件的简单存储:RRD Tool,Graphite Whisper.这类工具附属于监控告警工具,底层没有 ...
[转帖]时间序列数据库 (TSDB)
时间序列数据库 (TSDB) https://www.jianshu.com/p/31afb8492eff 0.3392019.01.28 10:51:33字数 5598阅读 4030 背景 2017 ...
Akumuli时间序列数据库——列存储，LSM，MVCC
Features Column-oriented time-series database. Log-structured append-only B+tree with multiversion c ...
时间序列数据库选型——本质是列存储，B-tree索引，抑或是搜索引擎中的倒排索引
时间序列数据库最多,使用也最广泛.一般人们谈论时间序列数据库的时候指代的就是这一类存储.按照底层技术不同可以划分为三类. 直接基于文件的简单存储:RRD Tool,Graphite Whisper.这 ...
Python 数据分析（二本实验将学习利用 Python 数据聚合与分组运算，时间序列，金融与经济数据应用等相关知识
Python 数据分析(二) 本实验将学习利用 Python 数据聚合与分组运算,时间序列,金融与经济数据应用等相关知识第1节 groupby 技术第2节数据聚合第3节分组级运算和转换第4 ...

随机推荐

vue2.X computed 计算属性
需求:数据msg值为12345,我们现在需要反向显示成54321. 1.在模板中绑定表达式是非常便利的,但是它们实际上只用于简单的操作.在模板中放入太多的逻辑会让模板过重且难以维护.例如: <! ...
我的IT成长史，不以物喜，不以己悲
http://bbs.51cto.com/thread-1066048-1.html本人87年,出生在北方一座3线城市,从小学就喜欢计算机,带着鞋套去机房练习打英文字母:初中顺理成章的通过了计算机1级 ...
curl库pycurl实例及参数详解
pycurl是功能强大的python的url库,是用c语言写的,速度很快,比urllib和httplib都快. 今天我们来看一下pycurl的用法及参数详解常用方法: pycurl.Curl() # ...
当Design Support Library遇上RecycleView
近期对Design Support Library中的一些新组件做了些研究,当中涉及到CoordinatorLayout.AppBarLayout.CollapsingToolbarLayout,为了 ...
rst2pdf 中文
上篇说到用pandoc转换为reST为pdf是使用LaTeX作为中间格式的,而今天要说的rst2pdf貌似是直接转换为pdf的. 安装和调用 rst2pdf目前只支持Python2.7,因此在创建vi ...
cacti 安装和组件添加
安装cacti 步骤 1.准备lamp环境 2.准备所需包:rrdtool(绘图) cacti(安装程序) net-snmpd(数据收集) 3.安装所需库文件 rrdtool所需库文件有: cairo ...
具体解释TCP协议的服务特点以及连接建立与终止的过程(俗称三次握手四次挥手)
转载请附本文的链接地址:http://blog.csdn.net/sahadev_/article/details/50780825 ,谢谢. tcp/ip技术经常会在我们面试的时候出现,非常多公司也 ...
【BZOJ1776】[Usaco2010 Hol]cowpol 奶牛政坛树的直径
[BZOJ1776][Usaco2010 Hol]cowpol 奶牛政坛 Description 农夫约翰的奶牛住在N (2 <= N <= 200,000)片不同的草地上,标号为1到N. ...
maven snapshot和release版本的区别（转）
在使用maven过程中,我们在开发阶段经常性的会有很多公共库处于不稳定状态,随时需要修改并发布,可能一天就要发布一次,遇到bug时,甚至一天要发布N次.我们知道,maven的依赖管理是基于版本管理的, ...
mac上完整卸载删除.简单粗暴无脑:androidstudio删除方案
如果你是mac ,你删除as ,删不干净也正常,你会发现安装的时候,前面的东西也在.配置文件在,会导致你以前的错误不想要的东西都在. 废话不多说,复制粘贴就是干!!!!~~~~~~~~ 第一步: 复 ...

druid相关的时间序列数据库——也用到了倒排相关的优化技术

druid相关的时间序列数据库——也用到了倒排相关的优化技术的更多相关文章

随机推荐

热门专题