feature   

strom (trident) spark streaming 说明
并行框架
基于DAG的任务并行计算引擎(task parallel continuous computational engine Using DAG)
基于spark的数据并行计算引擎(data parallel general purpose batch processing engine)

数据处理模式
(one at a time)一次处理一个事件(消息)
trident: (Micro-batch)一次   处理多个事件
(Micro-batch)一次   处理多个事件

延时
小于一秒
trident(数秒)
数秒)

Thanks for the article!
Could you please explain this point in a bit more detail? "But, it relies on transactions to update state, which is slower and often has to be implemented by the user."
If I want to write my output to a persistent store e.g. redis, then why would it be slower in Storm than in Spark Streaming?

Reply

Replies
  1. Hi Josh, please check out the slide about Storm/Trident here: http://spark-summit.org/wp-content/uploads/2013/10/Spark-Summit-2013-Spark-Streaming.pdf
    If you want exactly-once semantics with Trident, you have to store a per-state transaction ID for each state. I.e., in word-count, for each word, you would store both the count as well as a transaction ID; each key-value pair would look like: (Key:word, Value: count, txid). Before updating the count, you would read in the old transaction ID to make sure it's up to date, and this read causes extra latency. If you are using redis in memory, that might be okay, but if it has to go to disk then that would add noticeable latency to the update. Whereas in Spark, you don't have to store a per-state transaction ID.
    For the details of Trident transactional processing, see http://storm.apache.org/documentation/Trident-state

  2. Hi Xinh, thanks for the explanation. I see, isn't that similar to Spark checkpointing - where it saves states to HDFS every ~10 seconds? or is your point that with Storm it would (by default) persist the state much more frequently than Spark?

  3. Hi Josh, yes, the fault tolerance in Spark involves periodic (~10 second) checkpointing of RDDs. Yes, my point is that with Storm Trident the persistence occurs when each batch is processed, and by default that occurs a lot more than once every 10 seconds. And, in tuning any of these parameters, there's a tradeoff in the frequency of persistence vs. recovery time in the case of failure.

容错
至少一次
trident:精确一次
精确一次
源出处
BackType and Twitter
UCB
实现语言
Clojure scala
API支持
java、python、ruby等
jscala、java、python

平台集成
NA(基于zookeeper)
spark(所以可以统一(或共用)时事处理与历史数据的处理)

产品、支持
Storm has been around for several years and has run in production at Twitter since 2011, as well as at many other companies
Meanwhile, Spark Streaming is a newer project; its only production deployment (that I am aware of) has been at Sharethrough since 2013.

计算理论框架   
Storm is the streaming solution in the Hortonworks Hadoop data platform
Spark Streaming is in both MapR's distribution and Cloudera's Enterprise data platformDatabricks

集群集成,部署方式
依赖zookeeper,standalone,messo
standalone,yarn,messo   

google trend   



bug燃烧图   

https://issues.apache.org/jira/browse/STORM/

https://issues.apache.org/jira/browse/SPARK/
可见spark问题解决比storm要及时得多









spark streaming 与 storm的对比的更多相关文章

  1. Spark Straming,Spark Streaming与Storm的对比分析

    Spark Straming,Spark Streaming与Storm的对比分析 一.大数据实时计算介绍 二.大数据实时计算原理 三.Spark Streaming简介 3.1 SparkStrea ...

  2. Spark Streaming与Storm的对比及使用场景

    Spark Streaming与Storm都可以做实时计算,那么在做技术选型的时候到底应该选择哪个呢?通过下图可以从计算模型.计算延迟.吞吐量.事物.容错性.动态并行度等方方面进行对比. 对比点    ...

  3. Spark Streaming与Storm的对比

  4. Apache 流框架 Flink,Spark Streaming,Storm对比分析(一)

    本文由  网易云发布. 1.Flink架构及特性分析 Flink是个相当早的项目,开始于2008年,但只在最近才得到注意.Flink是原生的流处理系统,提供high level的API.Flink也提 ...

  5. Apache 流框架 Flink,Spark Streaming,Storm对比分析(二)

    本文由  网易云发布. 本文内容接上一篇Apache 流框架 Flink,Spark Streaming,Storm对比分析(一) 2.Spark Streaming架构及特性分析 2.1 基本架构 ...

  6. Apache 流框架 Flink,Spark Streaming,Storm对比分析(2)

    此文已由作者岳猛授权网易云社区发布. 欢迎访问网易云社区,了解更多网易技术产品运营经验. 2.Spark Streaming架构及特性分析 2.1 基本架构 基于是spark core的spark s ...

  7. spark streaming与storm比较

  8. Apache 流框架 Flink,Spark Streaming,Storm对比分析(1)

    此文已由作者岳猛授权网易云社区发布. 欢迎访问网易云社区,了解更多网易技术产品运营经验. 1.Flink架构及特性分析 Flink是个相当早的项目,开始于2008年,但只在最近才得到注意.Flink是 ...

  9. spark streaming (一)

    实时计算介绍 Spark Streaming, 其实就是一种Spark提供的, 对于大数据, 进行实时计算的一种框架. 它的底层, 其实, 也是基于我们之前讲解的Spark Core的. 基本的计算模 ...

随机推荐

  1. 1-10000以内的完数(js)

    //1-10000以内的完数 //完数:因子之和相加等于这个数 //例如:6的因子为1,2,3:1+2+3=6 // 6 // 28 // 496 // 8128 let sum = 0, i, j; ...

  2. JVM垃圾回收那些事

    Java这种VM类跨平台语言比起C++这种传统编译型语言很大的区别之一在于引入了垃圾自动回收机制.自动垃圾回收大大提高了Java程序员的开发效率并且极大地减少了犯错的概率,但终归而言由于无法像C++程 ...

  3. Java高并发程序设计学习笔记(四):无锁

    转自:https://blog.csdn.net/dataiyangu/article/details/86440836#1__3 1. 无锁类的原理详解简介:1.1. CAS1.2. CPU指令2. ...

  4. vmware修改虚拟机名称

    原虚拟机名称为:OLD_VMNAME需要修改成:NEW_VMNAME vmware创建虚拟机时,会以虚拟机名称存储对应的磁盘和配置文件.如果只在vcenter界面上修改虚拟机名称存储端名称是不会修改的 ...

  5. STM32WB HSE校准

    通过改变RCC_HSECR寄存器中的HSETUNE[5:0]位域的值来校准HSE的输出频率 1.将HSE时钟配置为MCO模式输出到PA8引脚 HAL_RCC_MCOConfig(RCC_MCO1, R ...

  6. BLE 5协议栈-逻辑链路控制与适配协议层(L2CAP)

    文章转载自:http://www.sunyouqun.com/2017/04/page/2/ 逻辑链路控制与适配协议通常简称为L2CAP(Logical Link Control and Adapta ...

  7. 通过SSH解压缩.tar.gz、.gz、.zip文件的方法

    一般在linux下,常用的压缩格式有如下几个: .tar.gz..gz..zip 解压 .tar.gz 文件命令: tar -zxvf xxx.tar.gz 解压 .gz 文件命令: gunzip x ...

  8. 设计模式相关面试问题-Builder基础详解与代码解读

    java的builder模式详解: 概念:建造者模式是较为复杂的创建型模式,它将客户端与多含多个组成部分(或部件)的复杂对象的创建过程分离. 使用场景:当构造一个对象需要很多参数的时候,并且参数的个数 ...

  9. hive单机部署

    hadoop,hbase,zookeeper安装好了,现在来安装hive hadoop 版本:2.8.4 hbase 版本:2.1.3 hive 版本:2.3.4 zookeeper 版本:3.4.1 ...

  10. p2456二进制方程 题解

    题面描述:可以跳过 一个形如: X1X2…Xn=Y1Y2..Ym 的等式称为二进制方程. 在二进制方程的两边:Xi和Yj (1<=i<=n:1<=j<=m)是二进制数字(0.1 ...