spark streaming 与 storm的对比

feature	strom （trident）	spark streaming	说明
并行框架	基于DAG的任务并行计算引擎（task parallel continuous computational engine Using DAG）	基于spark的数据并行计算引擎（data parallel general purpose batch processing engine）
数据处理模式	(one at a time)一次处理一个事件（消息） trident： (Micro-batch)一次处理多个事件	(Micro-batch)一次处理多个事件
延时	小于一秒 trident（数秒）	数秒）	JoshDecember 8, 2014 at 4:23 PM Thanks for the article! Could you please explain this point in a bit more detail? "But, it relies on transactions to update state, which is slower and often has to be implemented by the user." If I want to write my output to a persistent store e.g. redis, then why would it be slower in Storm than in Spark Streaming? Reply Replies Xinh HuynhDecember 13, 2014 at 5:24 PM Hi Josh, please check out the slide about Storm/Trident here: http://spark-summit.org/wp-content/uploads/2013/10/Spark-Summit-2013-Spark-Streaming.pdf If you want exactly-once semantics with Trident, you have to store a per-state transaction ID for each state. I.e., in word-count, for each word, you would store both the count as well as a transaction ID; each key-value pair would look like: (Key:word, Value: count, txid). Before updating the count, you would read in the old transaction ID to make sure it's up to date, and this read causes extra latency. If you are using redis in memory, that might be okay, but if it has to go to disk then that would add noticeable latency to the update. Whereas in Spark, you don't have to store a per-state transaction ID. For the details of Trident transactional processing, see http://storm.apache.org/documentation/Trident-state JoshDecember 15, 2014 at 9:18 AM Hi Xinh, thanks for the explanation. I see, isn't that similar to Spark checkpointing - where it saves states to HDFS every ~10 seconds? or is your point that with Storm it would (by default) persist the state much more frequently than Spark? Xinh HuynhDecember 15, 2014 at 11:43 AM Hi Josh, yes, the fault tolerance in Spark involves periodic (~10 second) checkpointing of RDDs. Yes, my point is that with Storm Trident the persistence occurs when each batch is processed, and by default that occurs a lot more than once every 10 seconds. And, in tuning any of these parameters, there's a tradeoff in the frequency of persistence vs. recovery time in the case of failure.
容错	至少一次 trident：精确一次	精确一次
源出处	BackType and Twitter	UCB
实现语言	Clojure	scala
API支持	java、python、ruby等	jscala、java、python
平台集成	NA(基于zookeeper)	spark（所以可以统一（或共用）时事处理与历史数据的处理）
产品、支持	Storm has been around for several years and has run in production at Twitter since 2011, as well as at many other companies	Meanwhile, Spark Streaming is a newer project; its only production deployment (that I am aware of) has been at Sharethrough since 2013.
计算理论框架	Storm is the streaming solution in the Hortonworks Hadoop data platform	Spark Streaming is in both MapR's distribution and Cloudera's Enterprise data platform. Databricks
集群集成，部署方式	依赖zookeeper，standalone，messo	standalone，yarn，messo
google trend
bug燃烧图	https://issues.apache.org/jira/browse/STORM/	https://issues.apache.org/jira/browse/SPARK/	可见spark问题解决比storm要及时得多

spark stream和storm之间的争论源远流长。。

refer:

http://xinhstechblog.blogspot.com/2014/06/storm-vs-spark-streaming-side-by-side.html

http://www.slideshare.net/ptgoetz/apache-storm-vs-spark-streaming

http://www.zdatainc.com/2014/09/apache-storm-apache-spark/

From WizNote

spark streaming 与 storm的对比的更多相关文章

Spark Straming，Spark Streaming与Storm的对比分析
Spark Straming,Spark Streaming与Storm的对比分析一.大数据实时计算介绍二.大数据实时计算原理三.Spark Streaming简介 3.1 SparkStrea ...
Spark Streaming与Storm的对比及使用场景
Spark Streaming与Storm都可以做实时计算,那么在做技术选型的时候到底应该选择哪个呢?通过下图可以从计算模型.计算延迟.吞吐量.事物.容错性.动态并行度等方方面进行对比. 对比点 ...
Spark Streaming与Storm的对比
Apache 流框架 Flink，Spark Streaming，Storm对比分析（一）
本文由网易云发布. 1.Flink架构及特性分析 Flink是个相当早的项目,开始于2008年,但只在最近才得到注意.Flink是原生的流处理系统,提供high level的API.Flink也提 ...
Apache 流框架 Flink，Spark Streaming，Storm对比分析（二）
本文由网易云发布. 本文内容接上一篇Apache 流框架 Flink,Spark Streaming,Storm对比分析(一) 2.Spark Streaming架构及特性分析 2.1 基本架构 ...
Apache 流框架 Flink，Spark Streaming，Storm对比分析（2）
此文已由作者岳猛授权网易云社区发布. 欢迎访问网易云社区,了解更多网易技术产品运营经验. 2.Spark Streaming架构及特性分析 2.1 基本架构基于是spark core的spark s ...
spark streaming与storm比较
Apache 流框架 Flink，Spark Streaming，Storm对比分析（1）
此文已由作者岳猛授权网易云社区发布. 欢迎访问网易云社区,了解更多网易技术产品运营经验. 1.Flink架构及特性分析 Flink是个相当早的项目,开始于2008年,但只在最近才得到注意.Flink是 ...
spark streaming (一)
实时计算介绍 Spark Streaming, 其实就是一种Spark提供的, 对于大数据, 进行实时计算的一种框架. 它的底层, 其实, 也是基于我们之前讲解的Spark Core的. 基本的计算模 ...

随机推荐

js 使用sessionStorage总结与实例
作用:它只是可以将一部分数据在当前会话中保存下来,刷新页面数据依旧存在.但当页面关闭后,sessionStorage 中的数据就会被清空 sessionStorage的方法setItem存储value ...
js小功能1：全选全不选
<form> 请选择你爱好:<br> <input type="checkbox" name="hobby" id="h ...
app hellocharts 柱状图
app里有个告警数量的柱状图,有点小问题,y轴竟然不是整数这个改起来到是简单 Axis yAxis = new Axis().setHasLines(true).setTextColor(Color ...
Base64加密后有换行回车的解决办法
据RFC 822规定,每76个字符,还需要加上一个回车换行有时就因为这些换行弄得出了问题,解决办法如下,替换所有换行和回车 String bTemp = Base64.encodeBase64Str ...
Unexpected console statement (no-console)
在vue cli项目中用consloe.log()打印,启动项目报错 export default { name: 'app', components: { }, created() { this.t ...
Java学习笔记【二、标识符、关键字、数据类型】
基础语法大小写敏感类名用帕斯卡命名法方法名用驼峰命名法所有java程序,源码文件名须与类名一致所有java程序,均以 public static void main(string []arg ...
resultMap自定义映射---8.3.1. 解决列名（表中的字段名称）和实体类中的属性名不一致
1.1.1.1. 步骤一:将驼峰匹配注释掉 --------------测试完成后仍然回来开启其他地方可能用到一旦注释掉驼峰匹配,那么再通过queryUserById查询的结果中,用 ...
redis 的启动、关闭判断其是否在运行中
#检查后台进程是否正在运行 ps -ef |grep redis ps aux | grep redis #检测6379端口是否在监听 netstat -lntp | grep 6379 #使用配置文 ...
Spring-AOP-学习笔记(2)-AspectJ
1.启用@AspectJ,需要下载aspectjweaver.jar <aop:aspectj-autoproxy/> <!-- 注解启 ...
Pyspark常用API总结
DF 类似于二维表的数据结果 mame age 狗山石 23 获取df的列名: df.columns 显示当前值打印 df.show() show(2) show括号里面传入参数可以显示查看几行 s ...

spark streaming 与 storm的对比

spark streaming 与 storm的对比的更多相关文章

随机推荐

热门专题