Flink History Job

history job的写入
1. org.apache.flink.runtime.jobmanager，Object JobManager
runJobManager中指定使用MemoryArchivist进行作业保存
startJobManagerActors中创建了进行作业保存的actor
此archive的actor会被传入jobmanager的actor

2. org.apache.flink.runtime.jobmanager，Class JobManager
handleMessage中接收到JobStatusChanged的msg之后会根据逻辑判断调用removeJob
接收到RemoveJob消息后，会调用removeJob
接收到RemoveCachedJob的时候，会调用removeJob
在SubmitJob的时候如果发现没有leader，会调用removeJob
3.MemoryArchivist
handleMessage中的调用进行持久化的函数
archiveJsonFiles中的传入路径path和执行图graph调用FsJobArchivist进行持久化

4.FsJobArchivist
archiveJob(Path rootPath, AccessExecutionGraph graph)
rootPath是配置的路径
graph是作业的执行图
archiveJob中首先调用WebMonitorUtils.getJsonArchivists()获取持久化的json类型，实际调用的是WebRuntimeMonitor.getJsonArchivists
目前的类型包括
new CurrentJobsOverviewHandler.CurrentJobsOverviewJsonArchivist(),//joboverview

new JobPlanHandler.JobPlanJsonArchivist(),//jobs/:jobid/plan
new JobConfigHandler.JobConfigJsonArchivist(),//jobs/:jobid/config
new JobExceptionsHandler.JobExceptionsJsonArchivist(),//jobs/:jobid/exceptions
new JobDetailsHandler.JobDetailsJsonArchivist(),//jobs/:jobid,//jobs/:jobid/vertices
new JobAccumulatorsHandler.JobAccumulatorsJsonArchivist(),//jobs/:jobid/accumulators

new CheckpointStatsHandler.CheckpointStatsJsonArchivist(),//jobs/:jobid/checkpoints
new CheckpointConfigHandler.CheckpointConfigJsonArchivist(),//jobs/:jobid/checkpoints/config
new CheckpointStatsDetailsHandler.CheckpointStatsDetailsJsonArchivist(),//jobs/:jobid/checkpoints/details/:checkpointid
new CheckpointStatsDetailsSubtasksHandler.CheckpointStatsDetailsSubtasksJsonArchivist(),//jobs/:jobid/checkpoints/details/:checkpointid/subtasks/:vertexid

new JobVertexDetailsHandler.JobVertexDetailsJsonArchivist(),//jobs/:jobid/vertices/:vertexid
new SubtasksTimesHandler.SubtasksTimesJsonArchivist(),//jobs/:jobid/vertices/:vertexid/subtasktimes
new JobVertexTaskManagersHandler.JobVertexTaskManagersJsonArchivist(),//jobs/:jobid/vertices/:vertexid/taskmanagers
new JobVertexAccumulatorsHandler.JobVertexAccumulatorsJsonArchivist(),//jobs/:jobid/vertices/:vertexid/accumulators
new SubtasksAllAccumulatorsHandler.SubtasksAllAccumulatorsJsonArchivist(),//jobs/:jobid/vertices/:vertexid/subtasks/accumulators

new SubtaskExecutionAttemptDetailsHandler.SubtaskExecutionAttemptDetailsJsonArchivist(),//jobs/:jobid/vertices/:vertexid/subtasks/:subtasknum,//jobs/:jobid/vertices/:vertexid/subtasks/:subtasknum/attempts/:attempt,
new SubtaskExecutionAttemptAccumulatorsHandler.SubtaskExecutionAttemptAccumulatorsJsonArchivist(),//jobs/:jobid/vertices/:vertexid/subtasks/:subtasknum/attempts/:attempt/accumulators

上面所有的archivist都继承于JsonArchivist
其中只有一个接口 Collection<ArchivedJson> archiveJsonWithPath(AccessExecutionGraph graph) throws IOException
其从graph中获取相应的信息组装成ArchivedJson,ArchivedJson的定义如下
public ArchivedJson(String path, String json) {
this.path = Preconditions.checkNotNull(path);
this.json = Preconditions.checkNotNull(json);
}
其中path指定存储的位置，json指定存储的内容

如果要新定义restful接口，则可以在上面增加JsonArchivist类型
如果只是要在已有的restful接口中增加字段，则可以修改上述的类型

5.上述流程走完之后，每个job会在hdfs上生成一个json文件，包含各种路径、指明对应的维度

History Job的读取
org.apache.flink.runtime.webmonitor.history
1.HistoryServer，负责历史作业的存储和展示，包含一个HistoryServerArchiveFetcher对象，此对象使用“刷新间隔，拉取路径，本地临时地址，”
2.HistoryServerArchiveFetcher根据指定的时间间隔，在单独的线程中调用JobArchiveFetcherTask获取的任务
3.JobArchiveFetcherTask是一个线程类，从指定的目录中不断的拉取数据，存入本地指定的路径；如果设置了每次拉取之后更新joboverview,则在拉取完毕之后进行joboverview的更新
4.org.apache.flink.runtime.history
调用FsJobArchivist中的Collection<ArchivedJson> getArchivedJsons(Path file)来获取数据，path指定存储的位置，返回该位置的所有Json数据

5.上述流程完毕之后，会在本地临时目录每个job创建一个目录，目录中有很多子目录，分门别类的保存了各种的json文件

文件保存

从上述的过程中，在jobmanager写入文件的时候，是不考虑频繁读取的，所以写成了一个大文件，也符合hdfs的要求，但是在history server的保存中，如上的在hdfs中的一个文件被安装路径和维度被拆成了很多个json文件，也是为了在UI上便于展示。

Flink History Job的更多相关文章

在 Cloudera Data Flow 上运行你的第一个 Flink 例子
文档编写目的 Cloudera Data Flow(CDF) 作为 Cloudera 一个独立的产品单元,围绕着实时数据采集,实时数据处理和实时数据分析有多个不同的功能模块,如下图所示: 图中 4 个 ...
Flink - Checkpoint
Flink在流上最大的特点,就是引入全局snapshot, CheckpointCoordinator 做snapshot的核心组件为, CheckpointCoordinator /** * T ...
Managing Large State in Apache Flink®: An Intro to Incremental Checkpointing
January 23, 2018- Apache Flink, Flink Features Stefan Richter and Chris Ward Apache Flink was purpos ...
Flink基本概念
Flink基本概念 1.The history of Flink? 2.What is Flink? Apache Flink是一个开源的分布式.高性能.高可用.准确的流处理框架,主要由Java代码实 ...
使用flink Table &Sql api来构建批量和流式应用(2)Table API概述
从flink的官方文档,我们知道flink的编程模型分为四层,sql层是最高层的api,Table api是中间层,DataStream/DataSet Api 是核心,stateful Stream ...
flink ---- 系统内部消息传递的exactly once语义
At Most once,At Least once和Exactly once 在分布式系统中,组成系统的各个计算机是独立的.这些计算机有可能fail. 一个sender发送一条message到rec ...
【翻译】Flink Table Api & SQL — SQL客户端Beta 版
本文翻译自官网:SQL Client Beta https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlCl ...
深入理解Flink ---- 系统内部消息传递的exactly once语义
At Most once,At Least once和Exactly once 在分布式系统中,组成系统的各个计算机是独立的.这些计算机有可能fail. 一个sender发送一条message到rec ...
【翻译】Flink Table Api & SQL —Streaming 概念 —— 时态表
本文翻译自官网: Temporal Tables https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/strea ...

随机推荐

背景qwq
Delphi 版FindWindow 和 FindWindowEx 的语法和用法
FindWindow(lpClassName, {窗口的类名}lpWindowName: PChar {窗口的标题}): HWND; {返回窗口的句柄; 失败返 ...
Keras模型的保存方式
Keras模型的保存方式在运行并且训练出一个模型后获得了模型的结构与许多参数,为了防止再次训练以及需要更好地去使用,我们需要保存当前状态基本保存方式 h5 # 此处假设model为一个已经训练好的 ...
jqGrid使用手册
JQGrid是一个在jquery基础上做的一个表格控件,以ajax的方式和服务器端通信. JQGrid Demo 是一个在线的演示项目.在这里,可以知道jqgrid可以做什么事情.jQgrid 使用详 ...
利用html2canvas将当前网页保存为图片.
先分析下这个技术可实现的方式,以及优缺点吧! 前端实现缺点是:兼容性查,需要高级浏览器支持,因为需要支持 canvas 绘图,还有就是会操作 html5 canvas api.(如果不会使用canv ...
初步学习pg_control文件之十
接前文初步学习pg_control文件之九看下面这个 XLogRecPtr checkPoint; /* last check point record ptr */ 看看这个pointer究竟保 ...
深度学习（deep learning）优化调参细节（trick）
https://blog.csdn.net/h4565445654/article/details/70477979
DecimalFormat的用法
DecimalFormat 是 NumberFormat 的一个具体子类,用于格式化十进制数字. DecimalFormat 包含一个模式和一组符符号含义: 0 一个数字 # 一个数字,不包括 0 ...
Sql Server 2008 R2数据库中插入中文变成了问号
通过Insert语句插入数据库中,结果中文都变成了乱码.原因是在数据库中有一个属性需要设置,可以通过Sql server manager studio来进行设置,也要可以通过代码来设置 ...
Apache 配置说明
ServerRoot ServerRoot: The top of the directory tree under which the server's configuration, error, ...

Flink History Job

Flink History Job的更多相关文章

随机推荐

热门专题