Spark Streaming - DStream
map, reduce, join and window. Finally, processed data can be pushed out to filesystems, databases, and live dashboards. step2 Define the streaming computations by applying transformation and output operations to DStreams.
step3 Start receiving data and processing it using streamingContext.start().
step4 Wait for the processing to be stopped (manually or due to any error) using streamingContext.awaitTermination().
step5 The processing can be manually stopped using streamingContext.stop().
Points to remember:
- Once a context has been started, no new streaming computations can be set up or added to it.
- Once a context has been stopped, it cannot be restarted.
- Only one StreamingContext can be active in a JVM at the same time.
- stop() on StreamingContext also stops the SparkContext. To stop only the StreamingContext, set the optional parameter of
stop()calledstopSparkContextto false. - A SparkContext can be re-used to create multiple StreamingContexts, as long as the previous StreamingContext is stopped (without stopping the SparkContext) before the next StreamingContext is created.
Points to remember
When running a Spark Streaming program locally, do not use “local” or “local[1]” as the master URL. Either of these means that only one thread will be used for running tasks locally. If you are using a input DStream based on a receiver (e.g. sockets, Kafka, Flume, etc.), then the single thread will be used to run the receiver, leaving no thread for processing the received data. Hence, when running locally, always use “local[n]” as the master URL, where n > number of receivers to run (see Spark Properties for information on how to set the master).
Extending the logic to running on a cluster, the number of cores allocated to the Spark Streaming application must be more than the number of receivers. Otherwise the system will receive data, but not be able to process it.
window length - The duration of the window (3 in the figure).
sliding interval - The interval at which the window operation is performed (2 in the figure).
// Reduce last 30 seconds of data, every 10 seconds
val windowedWordCounts = pairs.reduceByKeyAndWindow((a:Int,b:Int) => (a + b), Seconds(30), Seconds(10))
4.1 Reducing the Batch Processing Times
Spark Streaming - DStream的更多相关文章
- 58、Spark Streaming: DStream的output操作以及foreachRDD详解
一.output操作 1.output操作 DStream中的所有计算,都是由output操作触发的,比如print().如果没有任何output操作,那么,压根儿就不会执行定义的计算逻辑. 此外,即 ...
- 54、Spark Streaming:DStream的transformation操作概览
一. transformation操作概览 Transformation Meaning map 对传入的每个元素,返回一个新的元素 flatMap 对传入的每个元素,返回一个或多个元素 filter ...
- spark streaming(2) DAG静态定义及DStream,DStreamGraph
DAG 中文名有向无环图.它不是spark独有技术.它是一种编程思想 ,甚至于hadoop阵营里也有运用DAG的技术,比如Tez,Oozie.有意思的是,Tez是从MapReduce的基础上深化而来的 ...
- 大数据技术之_19_Spark学习_04_Spark Streaming 应用解析 + Spark Streaming 概述、运行、解析 + DStream 的输入、转换、输出 + 优化
第1章 Spark Streaming 概述1.1 什么是 Spark Streaming1.2 为什么要学习 Spark Streaming1.3 Spark 与 Storm 的对比第2章 运行 S ...
- Spark Streaming源码分析 – DStream
A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous sequence o ...
- spark streaming 2: DStream
DStream是类似于RDD概念,是对数据的抽象封装.它是一序列的RDD,事实上,它大部分的操作都是对RDD支持的操作的封装,不同的是,每次DStream都要遍历它内部所有的RDD执行这些操作.它可以 ...
- Spark Streaming消费Kafka Direct方式数据零丢失实现
使用场景 Spark Streaming实时消费kafka数据的时候,程序停止或者Kafka节点挂掉会导致数据丢失,Spark Streaming也没有设置CheckPoint(据说比较鸡肋,虽然可以 ...
- Spark Streaming
Spark Streaming Spark Streaming 是Spark为了用户实现流式计算的模型. 数据源包括Kafka,Flume,HDFS等. DStream 离散化流(discretize ...
- spark streaming kafka1.4.1中的低阶api createDirectStream使用总结
转载:http://blog.csdn.net/ligt0610/article/details/47311771 由于目前每天需要从kafka中消费20亿条左右的消息,集群压力有点大,会导致job不 ...
随机推荐
- 【读书笔记 - Effective Java】05. 避免创建不必要的对象
1. 如果对象是不可变的(immutable),它就始终可以被重用. (1) 特别是String类型的对象. String str1 = new String("str"); // ...
- Vue项目用webpack打包后,预览时资源路径出错(文末有vue项目链接分享)
最近用vue写了一些项目,项目写完之后需要打包之后才能放到网上展示,所以在这里记录一下项目打包的过程以及遇到的一些问题. --------------------------------------- ...
- table表单制作个人简历
应用table表单,编程个人简历表单,同时运用了跨行rowspan和跨列colspan. <!DOCTYPE html> <html> <head> <met ...
- PHP实现长网址与短网址
原文地址:http://www.qqdeveloper.com/detail/29/1.html 什么是长链接.短链接 顾名思义,长链接就是一个很长的链接:短链接就是一个很短的链接.长链接可以生成短链 ...
- 【Java】abstract,final,static,private,protected,public的区别
[abstract]抽象的 1. abstract可以修饰类和成员方法,被abstract修饰的类称为抽象类,被abstract修饰成员方法叫抽象方法.抽象类不一定有抽象方法,但拥有抽象方法的类一定是 ...
- MySQL:数据存在则更新,不存在则插入
前提:表结构存在主键或唯一索引,插入数据包含主键或唯一索引而导致记录重复插入失败. 单条记录更新插入: ,,) ,b,c; 多条记录批量更新插入: ,,),(,,) ON DUPLICATE KEY ...
- 大数据学习--day05(嵌套循环、方法、递归)
嵌套循环.方法.递归 图形打印 public static void main(String[]arg) { /** * * * * * * */ // 3 2 1 0 // 1 3 5 for(in ...
- 10种简单的Java性能优化
你是否正打算优化hashCode()方法?是否想要绕开正则表达式?Lukas Eder介绍了很多简单方便的性能优化小贴士以及扩展程序性能的技巧. 最近“全网域(Web Scale)”一词被炒得火热,人 ...
- 『Python基础-14』匿名函数 `lambda`
匿名函数和关键字lambda 匿名函数就是没有名称的函数,也就是不再使用def语句定义的函数 在Python中,如果要声匿名函数,则需要使用lambda关键字 使用lambda声明的匿名函数能接收任何 ...
- 一个java新手配置IIS服务器的血泪史
接到一个二次开发项目,听说是asp页面,带着不要怂的态度于是接下了. 好嘛按照步骤来 1.了解需求:一个公司内部积分排名类型项目,已经被多次开发,我所需要的就是新增两个页面,一个是分店赛一个是分部赛. ...