Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, Ion Stoica University of California, Berkeley
Many “big data” applications need to act on data arriving in real time. However, current programming models for distributed stream processing are relatively low-level, often leaving the user to worry about consistency of state across the system and fault recovery. Furthermore, the models that provide fault recovery do so in an expensive manner, requiring either hot replication or long recovery times. We propose a new programming model, discretized streams (D-Streams), that offers a high-level functional API, strong consistency, and efficient fault recovery. D-Streams support a new recovery mechanism that improves efficiency over the traditional replication and upstream backup schemes in streaming databases— parallel recovery of lost state—and unlike previous systems, also mitigate stragglers. We implement D-Streams as an extension to the Spark cluster computing engine that lets users seamlessly intermix streaming, batch and interactive queries. Our system can process over 60 million records/second at sub-second latency on 100 nodes.
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing的更多相关文章
- Discretized Streams, 离散化的流数据处理
Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters ...
- FTH: (7156): *** Fault tolerant heap shim applied to current process. This is usually due to previous crashes. ***
这两天在Qtcreator上编译程序的时候莫名其妙的出现了FTH: (7156): *** Fault tolerant heap shim applied to current process. T ...
- Akka的fault tolerant
要想容错,该怎么办? 父actor首先要获知子actor的失败状态,然后确定该怎么办, “怎么办”这回事叫做“supervisorStrategy". // Restart the st ...
- 解决Qt4.8.6+VS2010运行程序提示 FTH: (6512): *** Fault tolerant heap shim applied to current process. This is usually due to previous crashes
这个问题偶尔碰到两次,现在又遇上了,解决办法如下: 打开注册表,设置HKLM\Software\Microsoft\FTH\Enabled 为0 打开CMD,运行Rundll32.exe fthsvc ...
- 翻译-In-Stream Big Data Processing 流式大数据处理
相当长一段时间以来,大数据社区已经普遍认识到了批量数据处理的不足.很多应用都对实时查询和流式处理产生了迫切需求.最近几年,在这个理念的推动下,催生出了一系列解决方案,Twitter Storm,Yah ...
- [翻译]Kafka Streams简介: 让流处理变得更简单
Introducing Kafka Streams: Stream Processing Made Simple 这是Jay Kreps在三月写的一篇文章,用来介绍Kafka Streams.当时Ka ...
- Kafka Streams简介: 让流处理变得更简单
Introducing Kafka Streams: Stream Processing Made Simple 这是Jay Kreps在三月写的一篇文章,用来介绍Kafka Streams.当时Ka ...
- All the Apache Streaming Projects: An Exploratory Guide
The speed at which data is generated, consumed, processed, and analyzed is increasing at an unbeliev ...
- 分布式流式计算平台——S4
本文是作者在充分阅读和理解Yahoo!最新发布的技术论文<S4:Distributed Stream Computing Platform>的基础上,所做出的知识分享. S4是Yahoo! ...
随机推荐
- Pyspark笔记一
1. pyspark读csv文件后无法显示中文 #pyspark读取csv格式时,不能显示中文 df = spark.read.csv(r"hdfs://mymaster:8020/user ...
- Python 写入训练日志文件并控制台输出
1. 背景 在深度学习的任务中,通常需要比较长时间的训练,因此我们会选择离开电脑.笔者在跟踪模型表现, 观察模型accuracy 以及 loss 的时候,比较传统的方法是在控制台print输出或者直接 ...
- GlusterFS分布式存储系统
一,分布式文件系统理论基础 1.1 分布式文件系统出现 计算机通过文件系统管理,存储数据,而现在数据信息爆炸的时代中人们可以获取的数据成指数倍的增长,单纯通过增加硬盘个数来扩展计算机文件系统的存储容量 ...
- vue---子调父 $emit (把子组件的数据传给父组件)
ps:App.vue 父组件 Hello.vue 子组件 ps:App.vue 父组件 Hello.vue 子组件 <!--App.vue :--> <template> & ...
- python3 jieba分词
一.jieba库用于分词,https://github.com/fxsjy/jieba 二.分词:分词精细:全局(文本分析)<精确(快速成词)<搜素(搜素引擎分词) #分词 str=r'今 ...
- EntityFramework 事物引发的问题
前记 还是最近做的日志模块,今天做最后的入库工作.在测试入库日志记录时,总是出现怪异的问题. 开启服务开始接收 Kafka 的消息,第一条数据没有问题,后面的都如不了库.很是懵~~~ 调试了很久定位在 ...
- Vue、webpack中默认的config.js、index.js 配置详情
在vue.js 框架搭建好后,其vue-cli 自动构建的目录里面相关环境变量及其基本变量配置,如下代码所示: module.exports = { build: { index: path.reso ...
- U盘损坏?
- 阿里云轻量级服务器和NGINX部署Django项目
部署条件: 1.一台阿里云服务器(本人的是CentOS系统的服务器) 2.已经构建好的项目 3.服务器上安装并配置Nginx 首先第一步:在服务器上安装并配置Nginx 进入服务器 $ ssh roo ...
- storm 安装配置
1.1.下载安装包 storm.apache.org 配置zookeeper:http://www.cnblogs.com/eggplantpro/p/7120893.html 1.2.解压安装包ta ...