spark-streaming的checkpoint机制源码分析
转发请注明原创地址 http://www.cnblogs.com/dongxiao-yang/p/7994357.html
spark-streaming定时对 DStreamGraph 和 JobScheduler 做 Checkpoint,来记录整个 DStreamGraph 的变化和每个 batch 的 job 的完成情况,Checkpoint 发起的间隔默认的是和 batchDuration 一致;即每次 batch 发起、提交了需要运行的 job 后就做 Checkpoint。另外在 job 完成了更新任务状态的时候再次做一下 Checkpoint。
一 checkpoint生成
job生成
private def generateJobs(time: Time) {
// Checkpoint all RDDs marked for checkpointing to ensure their lineages are
// truncated periodically. Otherwise, we may run into stack overflows (SPARK-6847).
ssc.sparkContext.setLocalProperty(RDD.CHECKPOINT_ALL_MARKED_ANCESTORS, "true")
Try {
jobScheduler.receiverTracker.allocateBlocksToBatch(time) // allocate received blocks to batch
graph.generateJobs(time) // generate jobs using allocated block
} match {
case Success(jobs) =>
val streamIdToInputInfos = jobScheduler.inputInfoTracker.getInfo(time)
jobScheduler.submitJobSet(JobSet(time, jobs, streamIdToInputInfos))
case Failure(e) =>
jobScheduler.reportError("Error generating jobs for time " + time, e)
PythonDStream.stopStreamingContextIfPythonProcessIsDead(e)
}
eventLoop.post(DoCheckpoint(time, clearCheckpointDataLater = false))
}
job 完成
private def clearMetadata(time: Time) {
ssc.graph.clearMetadata(time)
// If checkpointing is enabled, then checkpoint,
// else mark batch to be fully processed
if (shouldCheckpoint) {
eventLoop.post(DoCheckpoint(time, clearCheckpointDataLater = true))
} else {
// If checkpointing is not enabled, then delete metadata information about
// received blocks (block data not saved in any case). Otherwise, wait for
// checkpointing of this batch to complete.
val maxRememberDuration = graph.getMaxInputStreamRememberDuration()
jobScheduler.receiverTracker.cleanupOldBlocksAndBatches(time - maxRememberDuration)
jobScheduler.inputInfoTracker.cleanup(time - maxRememberDuration)
markBatchFullyProcessed(time)
}
}
上文里面的eventLoop是JobGenerator内部的一个消息事件队列的封装,eventLoop内部会有一个后台线程不断的去消费事件,所以DoCheckpoint这种类型的事件会经过processEvent ->
doCheckpoint 由checkpointWriter把生成的Checkpoint对象写到外部存储:
/** Processes all events */
private def processEvent(event: JobGeneratorEvent) {
logDebug("Got event " + event)
event match {
case GenerateJobs(time) => generateJobs(time)
case ClearMetadata(time) => clearMetadata(time)
case DoCheckpoint(time, clearCheckpointDataLater) =>
doCheckpoint(time, clearCheckpointDataLater)
case ClearCheckpointData(time) => clearCheckpointData(time)
}
} /** Perform checkpoint for the give `time`. */
private def doCheckpoint(time: Time, clearCheckpointDataLater: Boolean) {
if (shouldCheckpoint && (time - graph.zeroTime).isMultipleOf(ssc.checkpointDuration)) {
logInfo("Checkpointing graph for time " + time)
ssc.graph.updateCheckpointData(time)
checkpointWriter.write(new Checkpoint(ssc, time), clearCheckpointDataLater)
}
}
doCheckpoint在调用checkpointWriter写数据到hdfs之前,首先会运行一下ssc.graph.updateCheckpointData(time),这个方法的主要作用是更新DStreamGraph里所有input和output stream对应的checkpointData属性,调用链路为DStreamGraph.updateCheckpointData -> Dstream.updateCheckpointData -> checkpointData.update
def updateCheckpointData(time: Time) {
logInfo("Updating checkpoint data for time " + time)
this.synchronized {
outputStreams.foreach(_.updateCheckpointData(time))
}
logInfo("Updated checkpoint data for time " + time)
}
private[streaming] def updateCheckpointData(currentTime: Time) {
logDebug(s"Updating checkpoint data for time $currentTime")
checkpointData.update(currentTime)
dependencies.foreach(_.updateCheckpointData(currentTime))
logDebug(s"Updated checkpoint data for time $currentTime: $checkpointData")
}
private[streaming]
class DirectKafkaInputDStreamCheckpointData extends DStreamCheckpointData(this) {
def batchForTime: mutable.HashMap[Time, Array[(String, Int, Long, Long)]] = {
data.asInstanceOf[mutable.HashMap[Time, Array[OffsetRange.OffsetRangeTuple]]]
}
override def update(time: Time): Unit = {
batchForTime.clear()
generatedRDDs.foreach { kv =>
val a = kv._2.asInstanceOf[KafkaRDD[K, V]].offsetRanges.map(_.toTuple).toArray
batchForTime += kv._1 -> a
}
}
override def cleanup(time: Time): Unit = { }
override def restore(): Unit = {
batchForTime.toSeq.sortBy(_._1)(Time.ordering).foreach { case (t, b) =>
logInfo(s"Restoring KafkaRDD for time $t ${b.mkString("[", ", ", "]")}")
generatedRDDs += t -> new KafkaRDD[K, V](
context.sparkContext,
executorKafkaParams,
b.map(OffsetRange(_)),
getPreferredHosts,
// during restore, it's possible same partition will be consumed from multiple
// threads, so dont use cache
false
)
}
}
}
以DirectKafkaInputDStream为例,代码里重写了checkpointData的update等接口,所以DirectKafkaInputDStream会在checkpoint之前把正在运行的kafkaRDD对应的topic,partition,fromoffset,untiloffset全部存储到checkpointData里面data这个HashMap的属性当中,用于写checkpoint时进行序列化。
一个checkpoint里面包含的对象列表如下:
class Checkpoint(ssc: StreamingContext, val checkpointTime: Time)
extends Logging with Serializable {
val master = ssc.sc.master
val framework = ssc.sc.appName
val jars = ssc.sc.jars
val graph = ssc.graph
val checkpointDir = ssc.checkpointDir
val checkpointDuration = ssc.checkpointDuration
val pendingTimes = ssc.scheduler.getPendingTimes().toArray
val sparkConfPairs = ssc.conf.getAll
二 从checkpoint恢复服务
spark-streaming启用checkpoint代码里的StreamingContext必须严格按照官方demo实例的架构使用,即所有的streaming逻辑都放在一个返回StreamingContext的createContext方法上,
通过StreamingContext.getOrCreate方法进行初始化,在CheckpointReader.read找到checkpoint文件并且成功反序列化出checkpoint对象后,返回一个包含该checkpoint对象的StreamingContext,这个时候程序里的createContext就不会被调用,反之如果程序是第一次启动会通过createContext初始化StreamingContext
def getOrCreate(
checkpointPath: String,
creatingFunc: () => StreamingContext,
hadoopConf: Configuration = SparkHadoopUtil.get.conf,
createOnError: Boolean = false
): StreamingContext = {
val checkpointOption = CheckpointReader.read(
checkpointPath, new SparkConf(), hadoopConf, createOnError)
checkpointOption.map(new StreamingContext(null, _, null)).getOrElse(creatingFunc())
} def read(
checkpointDir: String,
conf: SparkConf,
hadoopConf: Configuration,
ignoreReadError: Boolean = false): Option[Checkpoint] = {
val checkpointPath = new Path(checkpointDir) val fs = checkpointPath.getFileSystem(hadoopConf) // Try to find the checkpoint files
val checkpointFiles = Checkpoint.getCheckpointFiles(checkpointDir, Some(fs)).reverse
if (checkpointFiles.isEmpty) {
return None
} // Try to read the checkpoint files in the order
logInfo(s"Checkpoint files found: ${checkpointFiles.mkString(",")}")
var readError: Exception = null
checkpointFiles.foreach { file =>
logInfo(s"Attempting to load checkpoint from file $file")
try {
val fis = fs.open(file)
val cp = Checkpoint.deserialize(fis, conf)
logInfo(s"Checkpoint successfully loaded from file $file")
logInfo(s"Checkpoint was generated at time ${cp.checkpointTime}")
return Some(cp)
} catch {
case e: Exception =>
readError = e
logWarning(s"Error reading checkpoint from file $file", e)
}
} // If none of checkpoint files could be read, then throw exception
if (!ignoreReadError) {
throw new SparkException(
s"Failed to read checkpoint from directory $checkpointPath", readError)
}
None
}
}
在从checkpoint恢复的过程中DStreamGraph由checkpoint恢复,下文的代码调用路径StreamingContext.graph->DStreamGraph.restoreCheckpointData -> DStream.restoreCheckpointData->checkpointData.restore
private[streaming] val graph: DStreamGraph = {
if (isCheckpointPresent) {
_cp.graph.setContext(this)
_cp.graph.restoreCheckpointData()
_cp.graph
} else {
require(_batchDur != null, "Batch duration for StreamingContext cannot be null")
val newGraph = new DStreamGraph()
newGraph.setBatchDuration(_batchDur)
newGraph
}
}
def restoreCheckpointData() {
logInfo("Restoring checkpoint data")
this.synchronized {
outputStreams.foreach(_.restoreCheckpointData())
}
logInfo("Restored checkpoint data")
}
private[streaming] def restoreCheckpointData() {
if (!restoredFromCheckpointData) {
// Create RDDs from the checkpoint data
logInfo("Restoring checkpoint data")
checkpointData.restore()
dependencies.foreach(_.restoreCheckpointData())
restoredFromCheckpointData = true
logInfo("Restored checkpoint data")
}
}
override def restore(): Unit = {
batchForTime.toSeq.sortBy(_._1)(Time.ordering).foreach { case (t, b) =>
logInfo(s"Restoring KafkaRDD for time $t ${b.mkString("[", ", ", "]")}")
generatedRDDs += t -> new KafkaRDD[K, V](
context.sparkContext,
executorKafkaParams,
b.map(OffsetRange(_)),
getPreferredHosts,
// during restore, it's possible same partition will be consumed from multiple
// threads, so dont use cache
false
)
}
}
仍然以DirectKafkaInputDStreamCheckpointData为例,这个方法从上文保存的checkpoint.data对象里获取RDD运行时的对应信息恢复出停止时的KafkaRDD。
private def restart() {
// If manual clock is being used for testing, then
// either set the manual clock to the last checkpointed time,
// or if the property is defined set it to that time
if (clock.isInstanceOf[ManualClock]) {
val lastTime = ssc.initialCheckpoint.checkpointTime.milliseconds
val jumpTime = ssc.sc.conf.getLong("spark.streaming.manualClock.jump", 0)
clock.asInstanceOf[ManualClock].setTime(lastTime + jumpTime)
}
val batchDuration = ssc.graph.batchDuration
// Batches when the master was down, that is,
// between the checkpoint and current restart time
val checkpointTime = ssc.initialCheckpoint.checkpointTime
val restartTime = new Time(timer.getRestartTime(graph.zeroTime.milliseconds))
val downTimes = checkpointTime.until(restartTime, batchDuration)
logInfo("Batches during down time (" + downTimes.size + " batches): "
+ downTimes.mkString(", "))
// Batches that were unprocessed before failure
val pendingTimes = ssc.initialCheckpoint.pendingTimes.sorted(Time.ordering)
logInfo("Batches pending processing (" + pendingTimes.length + " batches): " +
pendingTimes.mkString(", "))
// Reschedule jobs for these times
val timesToReschedule = (pendingTimes ++ downTimes).filter { _ < restartTime }
.distinct.sorted(Time.ordering)
logInfo("Batches to reschedule (" + timesToReschedule.length + " batches): " +
timesToReschedule.mkString(", "))
timesToReschedule.foreach { time =>
// Allocate the related blocks when recovering from failure, because some blocks that were
// added but not allocated, are dangling in the queue after recovering, we have to allocate
// those blocks to the next batch, which is the batch they were supposed to go.
jobScheduler.receiverTracker.allocateBlocksToBatch(time) // allocate received blocks to batch
jobScheduler.submitJobSet(JobSet(time, graph.generateJobs(time)))
}
// Restart the timer
timer.start(restartTime.milliseconds)
logInfo("Restarted JobGenerator at " + restartTime)
}
最后,在restart的过程中,JobGenerator通过当前时间和上次程序停止的时间计算出程序重启过程中共有多少batch没有生成,加上上一次程序死掉的过程中有多少正在运行的job,全部
进行Reschedule,补跑任务。
参考文档
2Spark Streaming揭秘 Day33 checkpoint的使用
spark-streaming的checkpoint机制源码分析的更多相关文章
- 《深入理解Spark:核心思想与源码分析》(前言及第1章)
自己牺牲了7个月的周末和下班空闲时间,通过研究Spark源码和原理,总结整理的<深入理解Spark:核心思想与源码分析>一书现在已经正式出版上市,目前亚马逊.京东.当当.天猫等网站均有销售 ...
- 《深入理解Spark:核心思想与源码分析》(第2章)
<深入理解Spark:核心思想与源码分析>一书前言的内容请看链接<深入理解SPARK:核心思想与源码分析>一书正式出版上市 <深入理解Spark:核心思想与源码分析> ...
- 《深入理解Spark:核心思想与源码分析》一书正式出版上市
自己牺牲了7个月的周末和下班空闲时间,通过研究Spark源码和原理,总结整理的<深入理解Spark:核心思想与源码分析>一书现在已经正式出版上市,目前亚马逊.京东.当当.天猫等网站均有销售 ...
- 《深入理解Spark:核心思想与源码分析》正式出版上市
自己牺牲了7个月的周末和下班空闲时间,通过研究Spark源码和原理,总结整理的<深入理解Spark:核心思想与源码分析>一书现在已经正式出版上市,目前亚马逊.京东.当当.天猫等网站均有销售 ...
- Spark Streaming揭秘 Day26 JobGenerator源码图解
Spark Streaming揭秘 Day26 JobGenerator源码图解 今天主要解析一下JobGenerator,它相当于一个转换器,和机器学习的pipeline比较类似,因为最终运行在Sp ...
- 《深入理解Spark:核心思想与源码分析》——SparkContext的初始化(叔篇)——TaskScheduler的启动
<深入理解Spark:核心思想与源码分析>一书前言的内容请看链接<深入理解SPARK:核心思想与源码分析>一书正式出版上市 <深入理解Spark:核心思想与源码分析> ...
- Spark Streaming揭秘 Day22 架构源码图解
Spark Streaming揭秘 Day22 架构源码图解 今天主要是通过图解的方式,对SparkStreaming的架构进行一下回顾. 下面这个是其官方标准的流程描述. SparkStreamin ...
- Springboot学习04-默认错误页面加载机制源码分析
Springboot学习04-默认错误页面加载机制源码分析 前沿 希望通过本文的学习,对错误页面的加载机制有这更神的理解 正文 1-Springboot错误页面展示 2-Springboot默认错误处 ...
- ApplicationEvent事件机制源码分析
<spring扩展点之三:Spring 的监听事件 ApplicationListener 和 ApplicationEvent 用法,在spring启动后做些事情> <服务网关zu ...
随机推荐
- luogu P1056 排座椅
题目描述 上课的时候总会有一些同学和前后左右的人交头接耳,这是令小学班主任十分头疼的一件事情.不过,班主任小雪发现了一些有趣的现象,当同学们的座次确定下来之后,只有有限的D对同学上课时会交头接耳.同学 ...
- 【莫队算法】URAL - 2080 - Wallet
http://www.cnblogs.com/icode-girl/p/5783983.html 要注意卡片没有都被使用的情况. #include<cstdio> #include< ...
- python3开发进阶-Django框架中的ORM的常用操作的补充(F查询和Q查询,事务)
阅读目录 F查询和Q查询 事务 一.F查询和Q查询 1.F查询 查询前的准备 class Product(models.Model): name = models.CharField(max_leng ...
- microsoft visual studio遇到了问题,需要关闭
http://www.microsoft.com/zh-cn/download/confirmation.aspx?id=13821 装上这个补丁: WindowsXP-KB971513-x86-CH ...
- VUE -- Vue.js每天必学之计算属性computed与$watch
在模板中绑定表达式是非常便利的,但是它们实际上只用于简单的操作.模板是为了描述视图的结构.在模板中放入太多的逻辑会让模板过重且难以维护.这就是为什么 Vue.js 将绑定表达式限制为一个表达式.如果需 ...
- WebLogic Server 多租户资源迁移
重新建立一个动态集群,并启动,注意监听地址不能和其他集群重合 选择相应的资源组进行迁移, 迁移后,访问新的地址成功. 通过OTD负载均衡器访问原有的地址成功. 直接访问原来后台地址失败,表示资源确实已 ...
- jenkins结合docker
参考:https://m.aliyun.com/yunqi/articles/80459?spm=5176.mtagdetail.0.0.vJJ8Gj 上面这篇文章讲述了一种工作思路:CICD(持续集 ...
- FL2440 ubifs文件系统烧录遇到的问题——内核分区的重要性
之前用的文件系统是initramfs的,这种文件系统是编译进内核里的,而开机之后内核是写在内存中的,所以每次掉电之后写进文件系统中的东西都会丢失.所以决定换成ubifs的文件系统.这种文件系统是跟内核 ...
- springMVC中Restful支持
RESTFul支持 http://localhost:8090/user/doAdd.action?username=tony&age=8 http://localhost:8090/user ...
- Spark导论(Spark自学一)
1.1 Spark是什么? Spark是一个用来实现快速而通用的集群计算的平台. 1.2 一个大一统的软件栈 Spark项目包含多个紧密集成的组件. 1.2.1 Spark Core Spark Co ...