我们在https://www.cnblogs.com/dongxiao-yang/p/9403427.html文章里分析了flink提交single job到yarn集群上的代码,flink在1.5版本后对整个框架的deploy方式重构了全新的流程(参考https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077),本文基于flink1.6.1版本源码分析一下新模式在yarn的整个流程。

一 初始化

客户端本地整个初始化流程与https://www.cnblogs.com/dongxiao-yang/p/9403427.html差不多,由于newmode的关系,几个有区别的地方为

1 final ClusterDescriptor<T> clusterDescriptor = customCommandLine.createClusterDescriptor(commandLine); ,返回的具体对象类为YarnClusterDescriptor

2 ClientFrontend.runProgram方法会进入if (isNewMode && clusterId == null && runOptions.getDetachedMode()) {..方法块,调用路径为

YarnClusterDescriptor.deployJobCluster->AbstractYarnClusterDescriptor.deployInternal->startAppMaster

这个时候我们发现AM的启动类变成了YarnJobClusterEntrypoint

二 YarnJobClusterEntrypoint

YarnJobClusterEntrypoint的main函数是整个AM进程的启动入口,在方法的最后会调用其祖父类ClusterEntrypoint的startCluster方法开启整个集群组件的启动过程。

具体调用链路为startCluster->runCluster->startClusterComponents

protected void startClusterComponents(
Configuration configuration,
RpcService rpcService,
HighAvailabilityServices highAvailabilityServices,
BlobServer blobServer,
HeartbeatServices heartbeatServices,
MetricRegistry metricRegistry) throws Exception {
synchronized (lock) {
dispatcherLeaderRetrievalService = highAvailabilityServices.getDispatcherLeaderRetriever(); resourceManagerRetrievalService = highAvailabilityServices.getResourceManagerLeaderRetriever(); LeaderGatewayRetriever<DispatcherGateway> dispatcherGatewayRetriever = new RpcGatewayRetriever<>(
rpcService,
DispatcherGateway.class,
DispatcherId::fromUuid,
10,
Time.milliseconds(50L)); LeaderGatewayRetriever<ResourceManagerGateway> resourceManagerGatewayRetriever = new RpcGatewayRetriever<>(
rpcService,
ResourceManagerGateway.class,
ResourceManagerId::fromUuid,
10,
Time.milliseconds(50L)); // TODO: Remove once we have ported the MetricFetcher to the RpcEndpoint
final ActorSystem actorSystem = ((AkkaRpcService) rpcService).getActorSystem();
final Time timeout = Time.milliseconds(configuration.getLong(WebOptions.TIMEOUT)); webMonitorEndpoint = createRestEndpoint(
configuration,
dispatcherGatewayRetriever,
resourceManagerGatewayRetriever,
transientBlobCache,
rpcService.getExecutor(),
new AkkaQueryServiceRetriever(actorSystem, timeout),
highAvailabilityServices.getWebMonitorLeaderElectionService()); LOG.debug("Starting Dispatcher REST endpoint.");
webMonitorEndpoint.start(); resourceManager = createResourceManager(
configuration,
ResourceID.generate(),
rpcService,
highAvailabilityServices,
heartbeatServices,
metricRegistry,
this,
clusterInformation,
webMonitorEndpoint.getRestBaseUrl()); jobManagerMetricGroup = MetricUtils.instantiateJobManagerMetricGroup(metricRegistry, rpcService.getAddress()); final HistoryServerArchivist historyServerArchivist = HistoryServerArchivist.createHistoryServerArchivist(configuration, webMonitorEndpoint); dispatcher = createDispatcher(
configuration,
rpcService,
highAvailabilityServices,
resourceManager.getSelfGateway(ResourceManagerGateway.class),
blobServer,
heartbeatServices,
jobManagerMetricGroup,
metricRegistry.getMetricQueryServicePath(),
archivedExecutionGraphStore,
this,
webMonitorEndpoint.getRestBaseUrl(),
historyServerArchivist); LOG.debug("Starting ResourceManager.");
resourceManager.start();
resourceManagerRetrievalService.start(resourceManagerGatewayRetriever); LOG.debug("Starting Dispatcher.");
dispatcher.start();
dispatcherLeaderRetrievalService.start(dispatcherGatewayRetriever);
}
}

从上述代码里可以发现,AM里面包含两个重要的全新组件:ResourceManager和Dispatcher

在FLIP6的改进下,Resource这个全新的角色定义如下:

The main tasks of the ResourceManager are

  • Acquire new TaskManager (or slots) by starting containers, or allocating them to a job

  • Giving failure notifications to JobManagers and TaskManagers

  • Caching TaskManagers (containers) to be reused, releasing TaskManagers (containers) that are unused for a certain period.

大体来说就是由ResourceManager负责和YARN集群进行资源申请上的沟通,并给指定JobManager分配特定

aa

在yarn模式下,ResourceManager对应的实现类为YarnResourceManager,在这个类的initialize方法中,我们可以发现它实例化了两个client,resourceManagerClient和nodeManagerClient,这两个客户端分别包含了Yarn框架的AMRMClientAsync和NMClient,分别用来负责和Yarn的ResourceManager和NodeManager通信。

    @Override
protected void initialize() throws ResourceManagerException {
try {
resourceManagerClient = createAndStartResourceManagerClient(
yarnConfig,
yarnHeartbeatIntervalMillis,
webInterfaceUrl);
} catch (Exception e) {
throw new ResourceManagerException("Could not start resource manager client.", e);
} nodeManagerClient = createAndStartNodeManagerClient(yarnConfig);
}

关于Dispatcher的定义如下,它取代了以前由jobManager负责的提交job给集群的工作,并且预期将来可以由一个dispatcher提交任务给多个集群。

The new design includes the concept of a Dispatcher. The dispatcher accepts job submissions from clients and starts the jobs on their behalf on a cluster manager.

The dispatcher is introduced because:

  • Some cluster managers need a central job spawning and monitoring instance

  • It subsumes the role of the standalone JobManager, waiting for jobs to be submitted

在本文的条件下,Dispatcher具体的实现类为MiniDispatcher,在dispatcher.start();调用后,整个调用链经过了

leaderElectionService.start(this)->
ZooKeeperLeaderElectionService.start->
ZooKeeperLeaderElectionService.isLeader->
Dispatcher.grantLeadership->
tryAcceptLeadershipAndRunJobs->
runJob->
createJobManagerRunner

调到了DisPatcher的createJobManagerRunner方法。

	private CompletableFuture<JobManagerRunner> createJobManagerRunner(JobGraph jobGraph) {
final RpcService rpcService = getRpcService(); final CompletableFuture<JobManagerRunner> jobManagerRunnerFuture = CompletableFuture.supplyAsync(
CheckedSupplier.unchecked(() ->
jobManagerRunnerFactory.createJobManagerRunner(
ResourceID.generate(),
jobGraph,
configuration,
rpcService,
highAvailabilityServices,
heartbeatServices,
blobServer,
jobManagerSharedServices,
new DefaultJobManagerJobMetricGroupFactory(jobManagerMetricGroup),
fatalErrorHandler)),
rpcService.getExecutor()); return jobManagerRunnerFuture.thenApply(FunctionUtils.uncheckedFunction(this::startJobManagerRunner));
}

  

上述代码可以分为两个部分,第一部分经过DefaultJobManagerRunnerFactory.createJobManagerRunner->new JobManagerRunner->new   JobMaster初始化了JobMaster对象。

第二部分经过

startJobManagerRunner->
JobManagerRunner.start->
ZooKeeperLeaderElectionService.start->
ZooKeeperLeaderElectionService.isLeader->

JobManagerRunner.grantLeadership->
verifyJobSchedulingStatusAndStartJobManager->

jobMaster.start->
startJobExecution->
	private Acknowledge startJobExecution(JobMasterId newJobMasterId) throws Exception {
validateRunsInMainThread(); checkNotNull(newJobMasterId, "The new JobMasterId must not be null."); if (Objects.equals(getFencingToken(), newJobMasterId)) {
log.info("Already started the job execution with JobMasterId {}.", newJobMasterId); return Acknowledge.get();
} setNewFencingToken(newJobMasterId); startJobMasterServices(); log.info("Starting execution of job {} ({})", jobGraph.getName(), jobGraph.getJobID()); resetAndScheduleExecutionGraph(); return Acknowledge.get();
} private void startJobMasterServices() throws Exception {
// start the slot pool make sure the slot pool now accepts messages for this leader
slotPool.start(getFencingToken(), getAddress()); //TODO: Remove once the ZooKeeperLeaderRetrieval returns the stored address upon start
// try to reconnect to previously known leader
reconnectToResourceManager(new FlinkException("Starting JobMaster component.")); // job is ready to go, try to establish connection with resource manager
// - activate leader retrieval for the resource manager
// - on notification of the leader, the connection will be established and
// the slot pool will start requesting slots
resourceManagerLeaderRetriever.start(new ResourceManagerLeaderListener());
}  

JobMaster首先调用startJobMasterServices进行连接flink resource manager ,启动jobmanager服务并注册等操作。然后通过resetAndScheduleExecutionGraph执行任务资源的初始化申请。resetAndScheduleExecutionGraph方法首先调用createAndRestoreExecutionGraph生成了整个任务的executiongraph,然后通过

scheduleExecutionGraph->
ExecutionGraph.scheduleForExecution->
scheduleEager->
ExecutionJobVertex.allocateResourcesForAll->
Execution.allocateAndAssignSlotForExecution->
ProviderAndOwner.allocateSlot->
SlotPool.allocateSlot->
allocateMultiTaskSlot

提出对任务slot资源的申请

SlotPool.requestSlotFromResourceManager->
ResourceManager.requestSlot->
SlotManager.registerSlotRequest->
internalRequestSlot->
ResourceActionsImpl.allocateResource->
YarnResourceManager.startNewWorker->

申请启动新的TaskManager

    @Override
public void startNewWorker(ResourceProfile resourceProfile) {
log.info("startNewWorker");
// Priority for worker containers - priorities are intra-application
//TODO: set priority according to the resource allocated
Priority priority = Priority.newInstance(generatePriority(resourceProfile));
int mem = resourceProfile.getMemoryInMB() < 0 ? defaultTaskManagerMemoryMB : resourceProfile.getMemoryInMB();
int vcore = resourceProfile.getCpuCores() < 1 ? defaultCpus : (int) resourceProfile.getCpuCores();
Resource capability = Resource.newInstance(mem, vcore);
requestYarnContainer(capability, priority);
} private void requestYarnContainer(Resource resource, Priority priority) {
resourceManagerClient.addContainerRequest(new AMRMClient.ContainerRequest(resource, null, null, priority)); // make sure we transmit the request fast and receive fast news of granted allocations
resourceManagerClient.setHeartbeatInterval(FAST_YARN_HEARTBEAT_INTERVAL_MS); numPendingContainerRequests++; log.info("Requesting new TaskExecutor container with resources {}. Number pending requests {}.",
resource,
numPendingContainerRequests);
}

上述代码就是flink resourcemanager 通过yarn客户端与yarn通信申请taskmanager部分代码

	@Override
public void onContainersAllocated(List<Container> containers) {
log.info("onContainersAllocated");
runAsync(() -> {
for (Container container : containers) {
log.info(
"Received new container: {} - Remaining pending container requests: {}",
container.getId(),
numPendingContainerRequests); if (numPendingContainerRequests > 0) {
numPendingContainerRequests--; final String containerIdStr = container.getId().toString();
final ResourceID resourceId = new ResourceID(containerIdStr); workerNodeMap.put(resourceId, new YarnWorkerNode(container)); try {
// Context information used to start a TaskExecutor Java process
ContainerLaunchContext taskExecutorLaunchContext = createTaskExecutorLaunchContext(
container.getResource(),
containerIdStr,
container.getNodeId().getHost()); nodeManagerClient.startContainer(container, taskExecutorLaunchContext);
} catch (Throwable t) {
log.error("Could not start TaskManager in container {}.", container.getId(), t); // release the failed container
workerNodeMap.remove(resourceId);
resourceManagerClient.releaseAssignedContainer(container.getId());
// and ask for a new one
requestYarnContainer(container.getResource(), container.getPriority());
}
} else {
// return the excessive containers
log.info("Returning excess container {}.", container.getId());
resourceManagerClient.releaseAssignedContainer(container.getId());
}
} // if we are waiting for no further containers, we can go to the
// regular heartbeat interval
if (numPendingContainerRequests <= 0) {
resourceManagerClient.setHeartbeatInterval(yarnHeartbeatIntervalMillis);
}
});
}

  

am客户端在taskmanager的客户端里会设置启动的主类org.apache.flink.yarn.YarnTaskExecutorRunner

flink on yarn部分源码解析 (FLIP-6 new mode)的更多相关文章

  1. flink on yarn部分源码解析

    转发请注明原创地址:https://www.cnblogs.com/dongxiao-yang/p/9403427.html flink任务的deploy形式有很多种选择,常见的有standalone ...

  2. [源码解析] 从TimeoutException看Flink的心跳机制

    [源码解析] 从TimeoutException看Flink的心跳机制 目录 [源码解析] 从TimeoutException看Flink的心跳机制 0x00 摘要 0x01 缘由 0x02 背景概念 ...

  3. [源码解析] 当 Java Stream 遇见 Flink

    [源码解析] 当 Java Stream 遇见 Flink 目录 [源码解析] 当 Java Stream 遇见 Flink 0x00 摘要 0x01 领域 1.1 Flink 1.2 Java St ...

  4. [源码解析] Flink的Slot究竟是什么?(1)

    [源码解析] Flink的Slot究竟是什么?(1) 目录 [源码解析] Flink的Slot究竟是什么?(1) 0x00 摘要 0x01 概述 & 问题 1.1 Fllink工作原理 1.2 ...

  5. [源码解析] Flink的Slot究竟是什么?(2)

    [源码解析] Flink 的slot究竟是什么?(2) 目录 [源码解析] Flink 的slot究竟是什么?(2) 0x00 摘要 0x01 前文回顾 0x02 注册/更新Slot 2.1 Task ...

  6. 《Flink 源码解析》—— 源码编译运行

    更新一篇知识星球里面的源码分析文章,去年写的,周末自己录了个视频,大家看下效果好吗?如果好的话,后面补录发在知识星球里面的其他源码解析文章. 前言 之前自己本地 clone 了 Flink 的源码,编 ...

  7. Flink 源码解析 —— 源码编译运行

    更新一篇知识星球里面的源码分析文章,去年写的,周末自己录了个视频,大家看下效果好吗?如果好的话,后面补录发在知识星球里面的其他源码解析文章. 前言 之前自己本地 clone 了 Flink 的源码,编 ...

  8. Flink 源码解析 —— 如何获取 ExecutionGraph ?

    https://t.zsxq.com/UnA2jIi 博客 1.Flink 从0到1学习 -- Apache Flink 介绍 2.Flink 从0到1学习 -- Mac 上搭建 Flink 1.6. ...

  9. Flink 源码解析 —— 深度解析 Flink 是如何管理好内存的?

    前言 如今,许多用于分析大型数据集的开源系统都是用 Java 或者是基于 JVM 的编程语言实现的.最着名的例子是 Apache Hadoop,还有较新的框架,如 Apache Spark.Apach ...

随机推荐

  1. Android进阶笔记:AIDL内部实现详解 (一)

    AIDL内部实现详解 (一) AIDL的作用是实现跨进程通讯使用方法也非常的简单,他的设计模式是典型的C/S架构.使用AIDL只要在Client端和Server端的项目根目录下面创建一个aidl的文件 ...

  2. html之marquee会移动的文字

    该标签不是HTML3.2的一部分,并且只支持MSIE3以后内核,所以如果你使用非IE内核浏览器(如:Netscape)可能无法看到下面一些很有意思的效果 该标签是个容器标签 语法: <marqu ...

  3. ylbtech-LanguageSamples-NamedAndOptional(命名和可选参数)

    ylbtech-Microsoft-CSharpSamples:ylbtech-LanguageSamples-NamedAndOptional(命名和可选参数) 1.A,示例(Sample) 返回顶 ...

  4. FPGA作为从机与STM32进行SPI协议通信---Verilog实现

    一.SPI协议简要介绍 SPI,是英语Serial Peripheral Interface的缩写,顾名思义就是串行外围设备接口.SPI,是一种高速的,全双工,同步的通信总线,并且在芯片的管脚上只占用 ...

  5. unity GI

    占坑 lightmapper有两种:  Enlighten, Progressive Enlighten: baked GI  静态物体 离线烘焙 precomputed GI 也是适用于静态物体 F ...

  6. Newtonsoft.Json.4.5.11使用方法总结---反序列化json字符串

    写在开头: 最近项目需求,需要在C#中处理json字符串,毫不犹豫的下载了Newtonsoft.Json 4.5.11(2012.12.17)http://json.codeplex.com/,然后百 ...

  7. 解决html视频播放只有声音没有图像的办法

    HTML5中并没有指定视频解码器,它留给了浏览器来决定. MP4有四种编码格式(MPEG4(DivX),MPEG4(Xvid),AVC(H264),HEVC(H265)): 只有使用AVC(H264) ...

  8. Centos&RHEL 6安装图形化

    Linux是一个多任务的多用户的操作系统,而在安装linux的时候经常遇到的问题-没有图形化桌面.在上节中我们演示了RHEL7安装图形化的过程,下面我们演示Centos6的图形化安装. 一.Cento ...

  9. QSignalMapper类的使用

    Qt中当定义了非常多的button,而他们的信号都同样时(比方都是点击信号),没有必要给他们每一个都设置信号和槽的链接.QSignalMapper给我们攻克了这个难题. 今天完毕这个相似的功能时,不知 ...

  10. 使用ant运行testng的testng.xml并且使用testng-results.xsl美化结果

    先看build.xml <?xml version="1.0" encoding="UTF-8"?> <project basedir=&qu ...