spark streaming 与 storm的对比

feature	strom （trident）	spark streaming	说明
并行框架	基于DAG的任务并行计算引擎（task parallel continuous computational engine Using DAG）	基于spark的数据并行计算引擎（data parallel general purpose batch processing engine）
数据处理模式	(one at a time)一次处理一个事件（消息） trident： (Micro-batch)一次处理多个事件	(Micro-batch)一次处理多个事件
延时	小于一秒 trident（数秒）	数秒）	JoshDecember 8, 2014 at 4:23 PM Thanks for the article! Could you please explain this point in a bit more detail? "But, it relies on transactions to update state, which is slower and often has to be implemented by the user." If I want to write my output to a persistent store e.g. redis, then why would it be slower in Storm than in Spark Streaming? Reply Replies Xinh HuynhDecember 13, 2014 at 5:24 PM Hi Josh, please check out the slide about Storm/Trident here: http://spark-summit.org/wp-content/uploads/2013/10/Spark-Summit-2013-Spark-Streaming.pdf If you want exactly-once semantics with Trident, you have to store a per-state transaction ID for each state. I.e., in word-count, for each word, you would store both the count as well as a transaction ID; each key-value pair would look like: (Key:word, Value: count, txid). Before updating the count, you would read in the old transaction ID to make sure it's up to date, and this read causes extra latency. If you are using redis in memory, that might be okay, but if it has to go to disk then that would add noticeable latency to the update. Whereas in Spark, you don't have to store a per-state transaction ID. For the details of Trident transactional processing, see http://storm.apache.org/documentation/Trident-state JoshDecember 15, 2014 at 9:18 AM Hi Xinh, thanks for the explanation. I see, isn't that similar to Spark checkpointing - where it saves states to HDFS every ~10 seconds? or is your point that with Storm it would (by default) persist the state much more frequently than Spark? Xinh HuynhDecember 15, 2014 at 11:43 AM Hi Josh, yes, the fault tolerance in Spark involves periodic (~10 second) checkpointing of RDDs. Yes, my point is that with Storm Trident the persistence occurs when each batch is processed, and by default that occurs a lot more than once every 10 seconds. And, in tuning any of these parameters, there's a tradeoff in the frequency of persistence vs. recovery time in the case of failure.
容错	至少一次 trident：精确一次	精确一次
源出处	BackType and Twitter	UCB
实现语言	Clojure	scala
API支持	java、python、ruby等	jscala、java、python
平台集成	NA(基于zookeeper)	spark（所以可以统一（或共用）时事处理与历史数据的处理）
产品、支持	Storm has been around for several years and has run in production at Twitter since 2011, as well as at many other companies	Meanwhile, Spark Streaming is a newer project; its only production deployment (that I am aware of) has been at Sharethrough since 2013.
计算理论框架	Storm is the streaming solution in the Hortonworks Hadoop data platform	Spark Streaming is in both MapR's distribution and Cloudera's Enterprise data platform. Databricks
集群集成，部署方式	依赖zookeeper，standalone，messo	standalone，yarn，messo
google trend
bug燃烧图	https://issues.apache.org/jira/browse/STORM/	https://issues.apache.org/jira/browse/SPARK/	可见spark问题解决比storm要及时得多

spark stream和storm之间的争论源远流长。。

refer:

http://xinhstechblog.blogspot.com/2014/06/storm-vs-spark-streaming-side-by-side.html

http://www.slideshare.net/ptgoetz/apache-storm-vs-spark-streaming

http://www.zdatainc.com/2014/09/apache-storm-apache-spark/

From WizNote

spark streaming 与 storm的对比的更多相关文章

Spark Straming，Spark Streaming与Storm的对比分析
Spark Straming,Spark Streaming与Storm的对比分析一.大数据实时计算介绍二.大数据实时计算原理三.Spark Streaming简介 3.1 SparkStrea ...
Spark Streaming与Storm的对比及使用场景
Spark Streaming与Storm都可以做实时计算,那么在做技术选型的时候到底应该选择哪个呢?通过下图可以从计算模型.计算延迟.吞吐量.事物.容错性.动态并行度等方方面进行对比. 对比点 ...
Spark Streaming与Storm的对比
Apache 流框架 Flink，Spark Streaming，Storm对比分析（一）
本文由网易云发布. 1.Flink架构及特性分析 Flink是个相当早的项目,开始于2008年,但只在最近才得到注意.Flink是原生的流处理系统,提供high level的API.Flink也提 ...
Apache 流框架 Flink，Spark Streaming，Storm对比分析（二）
本文由网易云发布. 本文内容接上一篇Apache 流框架 Flink,Spark Streaming,Storm对比分析(一) 2.Spark Streaming架构及特性分析 2.1 基本架构 ...
Apache 流框架 Flink，Spark Streaming，Storm对比分析（2）
此文已由作者岳猛授权网易云社区发布. 欢迎访问网易云社区,了解更多网易技术产品运营经验. 2.Spark Streaming架构及特性分析 2.1 基本架构基于是spark core的spark s ...
spark streaming与storm比较
Apache 流框架 Flink，Spark Streaming，Storm对比分析（1）
此文已由作者岳猛授权网易云社区发布. 欢迎访问网易云社区,了解更多网易技术产品运营经验. 1.Flink架构及特性分析 Flink是个相当早的项目,开始于2008年,但只在最近才得到注意.Flink是 ...
spark streaming (一)
实时计算介绍 Spark Streaming, 其实就是一种Spark提供的, 对于大数据, 进行实时计算的一种框架. 它的底层, 其实, 也是基于我们之前讲解的Spark Core的. 基本的计算模 ...

随机推荐

微信小程序下拉刷新
<config> { enablePullDownRefresh: true, navigationBarTitleText: "消息中心", } </confi ...
centos 查看ip
1.现象: 通过ip addr 查找Ip时,发现ens33中没有inet属性,如下图: 2.解决方法打开网卡配置文件 /etc/sysconfig/network-scripts/ifcfg-ens ...
第十章、sys模块
目录第十章.sys模块第十章.sys模块方法详解 sys.argv 命令行参数List,第一个元素是程序本身路径 sys.modules.keys() 返回所有已经导入的模块列表 sys.ex ...
写Java也得了解CPU–CPU缓存
CPU,一般认为写C/C++的才需要了解,写高级语言的(Java/C#/pathon…)并不需要了解那么底层的东西.我一开始也是这么想的,但直到碰到LMAX的Disruptor,以及马丁的博文,才发现 ...
deep_learning_Function_reduction_indices的用法
在tf.reduce_sum等函数中,有一个reduction_indices参数,表示函数的处理维度. 当没有reduction_indices这个参数,此时该参数取默认值None,将把input_ ...
deep_learning_凹凸函数
什么是凸函数及如何判断一个函数是否是凸函数 t元j 一.什么是凸函数对于一元函数f(xf(x),如果对于任意tϵ[0,1]tϵ[0,1]均满足:f(tx1+(1−t)x2)≤tf(x1)+(1−t) ...
C# 图像基本处理
使用第三方:AForge实现视频采集(实现视频采集.暂停) 实现图片的常用处理功能:旋转.反色.灰度.放大.缩小.模糊.拉伸.增强.锐化.裁剪...... 实现对图片进行文字编辑......
PAT乙级1013
题目链接 https://pintia.cn/problem-sets/994805260223102976/problems/994805309963354112 题解一从第一个素数开始找起,输出 ...
MySQL in和limit不能连用的问题
今天在mysql上处理一个数据量达到千万级的数据库表时,要取出满足条件的数据集,然后存入到mongo数据库,使用JPA提供的Pageble去拿分页,再用多线程去取数据时,发现刚开始效率还可以,肯定比单 ...
tbdr+mrt
有关mrt的在tbdr的架构下的内存排布 system memory肯定是dither 我对这里把握比较大 rt0 rgba8 rt1 r8 这样像素排列是rgba8r8rgba8r8rgba8r8. ...

spark streaming 与 storm的对比

spark streaming 与 storm的对比的更多相关文章

随机推荐

热门专题