spark streaming 1: SparkContex

/**
* Main entry point for Spark Streaming functionality. It provides methods used to create
* [[org.apache.spark.streaming.dstream.DStream]]s from various input sources. It can be either
* created by providing a Spark master URL and an appName, or from a org.apache.spark.SparkConf
* configuration (see core Spark documentation), or from an existing org.apache.spark.SparkContext.
* The associated SparkContext can be accessed using `context.sparkContext`. After
* creating and transforming DStreams, the streaming computation can be started and stopped
* using `context.start()` and `context.stop()`, respectively.
* `context.awaitTransformation()` allows the current thread to wait for the termination
* of the context by `stop()` or by an exception.
*/
class StreamingContext private[streaming] (
sc_ : SparkContext,
cp_ : Checkpoint,
batchDur_ : Duration
) extends Logging {
private[streaming] val graph: DStreamGraph = {
if (isCheckpointPresent) {
cp_.graph.setContext(this)
cp_.graph.restoreCheckpointData()
cp_.graph
} else {
assert(batchDur_ != null, "Batch duration for streaming context cannot be null")
val newGraph = new DStreamGraph()
newGraph.setBatchDuration(batchDur_)
newGraph
}
}
private val nextReceiverInputStreamId = new AtomicInteger(0)
private[streaming] val scheduler = new JobScheduler(this)
private[streaming] val waiter = new ContextWaiter
private[streaming] val progressListener = new StreamingJobProgressListener(this)
/** Add a [[org.apache.spark.streaming.scheduler.StreamingListener]] object for
* receiving system events related to streaming.
*/
def addStreamingListener(streamingListener: StreamingListener) {
scheduler.listenerBus.addListener(streamingListener)
}
/**
* Start the execution of the streams.
*
* @throws SparkException if the context has already been started or stopped.
*/
def start(): Unit = synchronized {
if (state == Started) {
throw new SparkException("StreamingContext has already been started")
}
if (state == Stopped) {
throw new SparkException("StreamingContext has already been stopped")
}
validate()
sparkContext.setCallSite(DStream.getCreationSite())
scheduler.start()
state = Started
}
class PluggableInputDStream[T: ClassTag](
@transient ssc_ : StreamingContext,
receiver: Receiver[T]) extends ReceiverInputDStream[T](ssc_) {
def getReceiver(): Receiver[T] = {
receiver
}
}
class SocketInputDStream[T: ClassTag](
@transient ssc_ : StreamingContext,
host: String,
port: Int,
bytesToObjects: InputStream => Iterator[T],
storageLevel: StorageLevel
) extends ReceiverInputDStream[T](ssc_) {
def getReceiver(): Receiver[T] = {
new SocketReceiver(host, port, bytesToObjects, storageLevel)
}
}
/**
* An input stream that reads blocks of serialized objects from a given network address.
* The blocks will be inserted directly into the block store. This is the fastest way to get
* data into Spark Streaming, though it requires the sender to batch data and serialize it
* in the format that the system is configured with.
*/
private[streaming]
class RawInputDStream[T: ClassTag](
@transient ssc_ : StreamingContext,
host: String,
port: Int,
storageLevel: StorageLevel
) extends ReceiverInputDStream[T](ssc_ ) with Logging {
def getReceiver(): Receiver[T] = {
new RawNetworkReceiver(host, port, storageLevel).asInstanceOf[Receiver[T]]
}
}
/**
* Create a input stream that monitors a Hadoop-compatible filesystem
* for new files and reads them using the given key-value types and input format.
* Files must be written to the monitored directory by "moving" them from another
* location within the same file system. File names starting with . are ignored.
* @param directory HDFS directory to monitor for new file
* @tparam K Key type for reading HDFS file
* @tparam V Value type for reading HDFS file
* @tparam F Input format for reading HDFS file
*/
def fileStream[
K: ClassTag,
V: ClassTag,
F <: NewInputFormat[K, V]: ClassTag
] (directory: String): InputDStream[(K, V)] = {
new FileInputDStream[K, V, F](this, directory)
}
private[streaming]
class TransformedDStream[U: ClassTag] (
parents: Seq[DStream[_]],
transformFunc: (Seq[RDD[_]], Time) => RDD[U]
) extends DStream[U](parents.head.ssc) {
require(parents.length > 0, "List of DStreams to transform is empty")
require(parents.map(_.ssc).distinct.size == 1, "Some of the DStreams have different contexts")
require(parents.map(_.slideDuration).distinct.size == 1,
"Some of the DStreams have different slide durations")
override def dependencies = parents.toList
override def slideDuration: Duration = parents.head.slideDuration
override def compute(validTime: Time): Option[RDD[U]] = {
val parentRDDs = parents.map(_.getOrCompute(validTime).orNull).toSeq
Some(transformFunc(parentRDDs, validTime))
}
}
spark streaming 1: SparkContex的更多相关文章
- Spark踩坑记——Spark Streaming+Kafka
[TOC] 前言 在WeTest舆情项目中,需要对每天千万级的游戏评论信息进行词频统计,在生产者一端,我们将数据按照每天的拉取时间存入了Kafka当中,而在消费者一端,我们利用了spark strea ...
- Spark Streaming+Kafka
Spark Streaming+Kafka 前言 在WeTest舆情项目中,需要对每天千万级的游戏评论信息进行词频统计,在生产者一端,我们将数据按照每天的拉取时间存入了Kafka当中,而在消费者一端, ...
- Storm介绍及与Spark Streaming对比
Storm介绍 Storm是由Twitter开源的分布式.高容错的实时处理系统,它的出现令持续不断的流计算变得容易,弥补了Hadoop批处理所不能满足的实时要求.Storm常用于在实时分析.在线机器学 ...
- flume+kafka+spark streaming整合
1.安装好flume2.安装好kafka3.安装好spark4.流程说明: 日志文件->flume->kafka->spark streaming flume输入:文件 flume输 ...
- spark streaming kafka example
// scalastyle:off println package org.apache.spark.examples.streaming import kafka.serializer.String ...
- Spark Streaming中动态Batch Size实现初探
本期内容 : BatchDuration与 Process Time 动态Batch Size Spark Streaming中有很多算子,是否每一个算子都是预期中的类似线性规律的时间消耗呢? 例如: ...
- Spark Streaming源码解读之No Receivers彻底思考
本期内容 : Direct Acess Kafka Spark Streaming接收数据现在支持的两种方式: 01. Receiver的方式来接收数据,及输入数据的控制 02. No Receive ...
- Spark Streaming架构设计和运行机制总结
本期内容 : Spark Streaming中的架构设计和运行机制 Spark Streaming深度思考 Spark Streaming的本质就是在RDD基础之上加上Time ,由Time不断的运行 ...
- Spark Streaming中空RDD处理及流处理程序优雅的停止
本期内容 : Spark Streaming中的空RDD处理 Spark Streaming程序的停止 由于Spark Streaming的每个BatchDuration都会不断的产生RDD,空RDD ...
随机推荐
- sql limit i,n
limit i,n; i:为查询结果的索引值(默认从0开始),当i=0时可省略i n:为查询结果返回的数量(也就是条数) 表示从第i+1条开始,取n条数据 limit n 等同于 limit 0,n ...
- c# log4net安装时在AssemblyInfo中提示找不到log4net解决办法
在安装log4net时,按照安装手册需要在AssemblyInfo.cs里添加log4net的配置信息 [assembly: log4net.Config.XmlConfigurator(Config ...
- 常用的排序算法介绍和在JAVA的实现(二)
一.写随笔的原因:本文接上次的常用的排序算法介绍和在JAVA的实现(一) 二.具体的内容: 3.交换排序 交换排序:通过交换元素之间的位置来实现排序. 交换排序又可细分为:冒泡排序,快速排序 (1)冒 ...
- pat (B)_1002
1002 写出这个数 (20 分) 读入一个正整数 n,计算其各位数字之和,用汉语拼音写出和的每一位数字. 输入格式: 每个测试输入包含 1 个测试用例,即给出自然数 n 的值.这里保证 n 小于 1 ...
- Win10下注册APlayer组件的正确姿势
1. 官网下载SDK 和 解码器 APlayer媒体播放引擎 2.解压SDK和解码器,把解码器codecs文件夹内所有文件复制到SDK文件夹内的bin\codecs目录里面 3.使用管理员权限打开CM ...
- --set-upstream新版本不在支持
--set-upstream最新版本貌似不在支持,使用--track和--set-uptream-to来替代 --set-upstream: git branch --set-upstream [本地 ...
- docker资源隔离实现方式
默认情况下,一个容器没有资源限制,几乎可以使用宿主主机的所有资源.docker提供了控制内存.cpu.block io.但是实际上主要是namespace和cgroup控制资源的隔离. Docker的 ...
- Istio 1.4 部署指南
原文链接:Istio 1.4 部署指南 Istio 一直处于快速迭代更新的过程中,它的部署方法也在不断更新,之前我在 1.0 版本中介绍的安装方法,对于最新的 1.4 版本已经不适用了.以后主流的部署 ...
- Mysql数据同步Elasticsearch方案总结
Mysql数据同步Elasticsearch方案总结 https://my.oschina.net/u/4000872/blog/2252620
- Django—ajax、前端后端编码格式,bulk_create批量插入语数据库、自定义分页
一.ajax简介: XML也是一门标记语言该语法应用场景 1.写配置文件 2.可以写前端页面(odoo框架中 erp) 每家公司都会有属于这家公司独有的内部管理软件:专门用来开发企业内部管理软件 框架 ...