MesosSchedulerDriver的代码在src/sched/sched.cpp里面实现。

 

 

Driver->run()调用start()

 

 

首先检测Mesos-Master的leader

 

 

 

创建一个线程。

 

SchedulerProcess的initialize()函数

 

里面主要注册消息处理函数。

 

  1. virtual
    void initialize()
  2.  {
  3.    install<Event>(&SchedulerProcess::receive);
  4.  
  5.    // TODO(benh): Get access to flags so that we can decide whether
  6.    // or not to make ZooKeeper verbose.
  7.    install<FrameworkRegisteredMessage>(
  8.        &SchedulerProcess::registered,
  9.        &FrameworkRegisteredMessage::framework_id,
  10.        &FrameworkRegisteredMessage::master_info);
  11.  
  12.    install<FrameworkReregisteredMessage>(
  13.        &SchedulerProcess::reregistered,
  14.        &FrameworkReregisteredMessage::framework_id,
  15.        &FrameworkReregisteredMessage::master_info);
  16.  
  17.    install<ResourceOffersMessage>(
  18.        &SchedulerProcess::resourceOffers,
  19.        &ResourceOffersMessage::offers,
  20.        &ResourceOffersMessage::pids);
  21.  
  22.    install<RescindResourceOfferMessage>(
  23.        &SchedulerProcess::rescindOffer,
  24.        &RescindResourceOfferMessage::offer_id);
  25.  
  26.    install<StatusUpdateMessage>(
  27.        &SchedulerProcess::statusUpdate,
  28.        &StatusUpdateMessage::update,
  29.        &StatusUpdateMessage::pid);
  30.  
  31.    install<LostSlaveMessage>(
  32.        &SchedulerProcess::lostSlave,
  33.        &LostSlaveMessage::slave_id);
  34.  
  35.    install<ExitedExecutorMessage>(
  36.        &SchedulerProcess::lostExecutor,
  37.        &ExitedExecutorMessage::executor_id,
  38.        &ExitedExecutorMessage::slave_id,
  39.        &ExitedExecutorMessage::status);
  40.  
  41.    install<ExecutorToFrameworkMessage>(
  42.        &SchedulerProcess::frameworkMessage,
  43.        &ExecutorToFrameworkMessage::slave_id,
  44.        &ExecutorToFrameworkMessage::executor_id,
  45.        &ExecutorToFrameworkMessage::data);
  46.  
  47.    install<FrameworkErrorMessage>(
  48.        &SchedulerProcess::error,
  49.        &FrameworkErrorMessage::message);
  50.  
  51.    // Start detecting masters.
  52.    detector->detect()
  53.      .onAny(defer(self(), &SchedulerProcess::detected, lambda::_1));
  54.  }

 

在前面的文章中,Mesos源码分析(6): Mesos Master的初始化中,

Allocator的initialize函数中,传入的OfferCallback是Master::offer。 

每过allocation_interval,Allocator都会计算每个framework的offer,然后依次调用Master::offer,将资源offer给相应的framework

在Master::offer函数中,生成如下的ResourceOffersMessage,并且发送给Framework。

 

对应到这里当Driver收到ResourceOffersMessage的消息的时候,会调用SchedulerProcess::resourceOffers

 

  1.   void resourceOffers(
  2.       const UPID& from,
  3.       const vector<Offer>& offers,
  4.       const vector<string>& pids)
  5.   {
  6. ……
  7.     VLOG(2) << "Received " << offers.size() << " offers";
  8. ……
  9.     scheduler->resourceOffers(driver, offers);
  10.  
  11.     VLOG(1) << "Scheduler::resourceOffers took " << stopwatch.elapsed();
  12.   }

 

最终调用了Framework的resourceOffers。

 

Test Framework的resourceOffers函数,根据得到的offers,创建一系列tasks,然后调用driver的launchTasks函数

  1. virtual
    void resourceOffers(SchedulerDriver* driver,
  2.                             const vector<Offer>& offers)
  3. {
  4.   foreach (const Offer& offer, offers) {
  5.     cout << "Received offer " << offer.id() << " with " << offer.resources()
  6.          << endl;
  7.  
  8.     static
    const Resources TASK_RESOURCES = Resources::parse(
  9.         "cpus:" + stringify(CPUS_PER_TASK) +
  10.         ";mem:" + stringify(MEM_PER_TASK)).get();
  11.  
  12.     Resources remaining = offer.resources();
  13.  
  14.     // Launch tasks.
  15.     vector<TaskInfo> tasks;
  16.     while (tasksLaunched < totalTasks &&
  17.            remaining.flatten().contains(TASK_RESOURCES)) {
  18.       int taskId = tasksLaunched++;
  19.  
  20.       cout << "Launching task " << taskId << " using offer "
  21.            << offer.id() << endl;
  22.  
  23.       TaskInfo task;
  24.       task.set_name("Task " + lexical_cast<string>(taskId));
  25.       task.mutable_task_id()->set_value(lexical_cast<string>(taskId));
  26.       task.mutable_slave_id()->MergeFrom(offer.slave_id());
  27.       task.mutable_executor()->MergeFrom(executor);
  28.  
  29.       Option<Resources> resources =
  30.         remaining.find(TASK_RESOURCES.flatten(role));
  31.  
  32.       CHECK_SOME(resources);
  33.       task.mutable_resources()->MergeFrom(resources.get());
  34.       remaining -= resources.get();
  35.  
  36.       tasks.push_back(task);
  37.     }
  38.  
  39.     driver->launchTasks(offer.id(), tasks);
  40.   }
  41. }

 

SchedulerProcess的launchTasks函数实现如下:

  1. void launchTasks(const vector<OfferID>& offerIds,
  2.                  const vector<TaskInfo>& tasks,
  3.                  const Filters& filters)
  4. {
  5.   Offer::Operation operation;
  6.   operation.set_type(Offer::Operation::LAUNCH);
  7.  
  8.   Offer::Operation::Launch* launch = operation.mutable_launch();
  9.   foreach (const TaskInfo& task, tasks) {
  10.     launch->add_task_infos()->CopyFrom(task);
  11.   }
  12.  
  13.   acceptOffers(offerIds, {operation}, filters);
  14. }
  15.  
  16. void acceptOffers(
  17.     const vector<OfferID>& offerIds,
  18.     const vector<Offer::Operation>& operations,
  19.     const Filters& filters)
  20. {
  21.   // TODO(jieyu): Move all driver side verification to master since
  22.   // we are moving towards supporting pure launguage scheduler.
  23.  
  24.   if (!connected) {
  25.     VLOG(1) << "Ignoring accept offers message as master is disconnected";
  26.  
  27.     // NOTE: Reply to the framework with TASK_LOST messages for each
  28.     // task launch. See details from notes in launchTasks.
  29.     foreach (const Offer::Operation& operation, operations) {
  30.       if (operation.type() != Offer::Operation::LAUNCH) {
  31.         continue;
  32.       }
  33.  
  34.       foreach (const TaskInfo& task, operation.launch().task_infos()) {
  35.         StatusUpdate update = protobuf::createStatusUpdate(
  36.             framework.id(),
  37.             None(),
  38.             task.task_id(),
  39.             TASK_LOST,
  40.             TaskStatus::SOURCE_MASTER,
  41.             None(),
  42.             "Master disconnected",
  43.             TaskStatus::REASON_MASTER_DISCONNECTED);
  44.  
  45.         statusUpdate(UPID(), update, UPID());
  46.       }
  47.     }
  48.     return;
  49.   }
  50.  
  51.   Call call;
  52.   CHECK(framework.has_id());
  53.   call.mutable_framework_id()->CopyFrom(framework.id());
  54.   call.set_type(Call::ACCEPT);
  55.  
  56.   Call::Accept* accept = call.mutable_accept();
  57.  
  58.   // Setting accept.operations.
  59.   foreach (const Offer::Operation& _operation, operations) {
  60.     Offer::Operation* operation = accept->add_operations();
  61.     operation->CopyFrom(_operation);
  62.   }
  63.  
  64.   // Setting accept.offer_ids.
  65.   foreach (const OfferID& offerId, offerIds) {
  66.     accept->add_offer_ids()->CopyFrom(offerId);
  67.  
  68.     if (!savedOffers.contains(offerId)) {
  69.       // TODO(jieyu): A duplicated offer ID could also cause this
  70.       // warning being printed. Consider refine this message here
  71.       // and in launchTasks as well.
  72.       LOG(WARNING) << "Attempting to accept an unknown offer " << offerId;
  73.     } else {
  74.       // Keep only the slave PIDs where we run tasks so we can send
  75.       // framework messages directly.
  76.       foreach (const Offer::Operation& operation, operations) {
  77.         if (operation.type() != Offer::Operation::LAUNCH) {
  78.           continue;
  79.         }
  80.  
  81.         foreach (const TaskInfo& task, operation.launch().task_infos()) {
  82.           const SlaveID& slaveId = task.slave_id();
  83.  
  84.           if (savedOffers[offerId].contains(slaveId)) {
  85.             savedSlavePids[slaveId] = savedOffers[offerId][slaveId];
  86.           } else {
  87.             LOG(WARNING) << "Attempting to launch task " << task.task_id()
  88.                          << " with the wrong slave id " << slaveId;
  89.           }
  90.         }
  91.       }
  92.     }
  93.  
  94.     // Remove the offer since we saved all the PIDs we might use.
  95.     savedOffers.erase(offerId);
  96.   }
  97.  
  98.   // Setting accept.filters.
  99.   accept->mutable_filters()->CopyFrom(filters);
  100.  
  101.   CHECK_SOME(master);
  102.   send(master.get().pid(), call);
  103. }

 

最终向Mesos-Master的leader发送launchTasks的消息。

Mesos源码分析(10): MesosSchedulerDriver的启动及运行一个Task的更多相关文章

  1. Mesos源码分析(7): Mesos-Slave的启动

      Mesos-Slave的启动是从src/slave/main.cpp中的main函数开始的.   看过了Mesos-Master的启动过程,Mesos-Slave的启动没有那么复杂了.   1. ...

  2. Mesos源码分析

    Mesos源码分析(1): Mesos的启动过程总论 Mesos源码分析(2): Mesos Master的启动之一 Mesos源码分析(3): Mesos Master的启动之二 Mesos源码分析 ...

  3. Mesos源码分析(12): Mesos-Slave接收到RunTask消息

    在前文Mesos源码分析(8): Mesos-Slave的初始化中,Mesos-Slave接收到RunTaskMessage消息,会调用Slave::runTask.   void Slave::ru ...

  4. SpringBoot源码分析之SpringBoot的启动过程

    SpringBoot源码分析之SpringBoot的启动过程 发表于 2017-04-30   |   分类于 springboot  |   0 Comments  |   阅读次数 SpringB ...

  5. Envoy 源码分析--程序启动过程

    目录 Envoy 源码分析--程序启动过程 初始化 main 入口 MainCommon 初始化 服务 InstanceImpl 初始化 启动 main 启动入口 服务启动流程 LDS 服务启动流程 ...

  6. Solr4.8.0源码分析(10)之Lucene的索引文件(3)

    Solr4.8.0源码分析(10)之Lucene的索引文件(3) 1. .si文件 .si文件存储了段的元数据,主要涉及SegmentInfoFormat.java和Segmentinfo.java这 ...

  7. Mesos源码分析(11): Mesos-Master接收到launchTasks消息

    根据Mesos源码分析(6): Mesos Master的初始化中的代码分析,当Mesos-Master接收到launchTask消息的时候,会调用Master::launchTasks函数.   v ...

  8. Spring源码分析专题 —— IOC容器启动过程(上篇)

    声明 1.建议先阅读<Spring源码分析专题 -- 阅读指引> 2.强烈建议阅读过程中要参照调用过程图,每篇都有其对应的调用过程图 3.写文不易,转载请标明出处 前言 关于 IOC 容器 ...

  9. neo4j源码分析1-编译打包启动

    date: 2018-03-22 title: "neo4j源码分析1-编译打包启动" author: "邓子明" tags: - 源码 - neo4j - 大 ...

随机推荐

  1. 29 内置方法 eval | exec 元类 单例

    eval与exec内置方法 将字符串作为执行目标,得到响应结果 eval常用作类型转换:该函数执行完有返回值 exec拥有执行更复杂的字符串:可以形成名称空间 eval内置函数的使用场景:   1.执 ...

  2. Qt-c++桌面编程报错:qt.qpa.plugin: Could not find the Qt platform plugin "windows" in "",已解决

    语言:c++ 编译库:Qt GUI,qt5.12.1 软件类型:Qt application,qt桌面软件 运行平台:window 10 ?按照[https://www.devbean.net/201 ...

  3. c编译步骤

    这几天查编译问题时,在头文件中加入某些错误信息,却发现没有编译报错.想了一下可能是,还未进行到语法分析阶段. 这里再了解一下编译过程. 一般而言代码编译包含了四个阶段的处理,即预处理(也称预编译,Pr ...

  4. Python-数据类型之字典

    一: 概述 字典是有大括号,逗号分隔,有k/v组成 字典的键必须hashable,如数字,字符串,布尔值,元组 二: 操作 2.1 增 2.1.1  直接赋值 如果键不存在,则增加 dic = {'n ...

  5. C程序设计实验报告

    试验项目:1.字符与ASCLL码 2.运算符与表达式的运用 3.顺序结构应用程序 4.数学函数的算法描述 5.鸡兔同笼的算法描述 6.确定坐标的算法描述 姓名:郭薪   实验地点:514实验室   试 ...

  6. libraries\include\boost-1_61\boost/regex/v4/perl_matcher.hpp(362): error C2292: 'boost::re_detail_106100::perl_matcher<const char *,std::allocator<boost::sub_match<const char *>>,boost::regex_traits<c

    这个问题在Windows上基于CMake编译Caffe-SSD的GPU版时出现. 网上找到的博客贴出的解决办法是删掉regex和rv相关代码,甚至不编译detection_output_layer.c ...

  7. git - 2.github

    注册账户 ... 配置公私钥 https://help.github.com/en/articles/connecting-to-github-with-ssh github helloworld

  8. ubuntu18.04使用SPFlashTool提示缺少libpng12.so.0

    Ubuntu libpng12无法安装解决 Ubuntu 14以上就已经不再支持libpng12,然而有些软件又依赖于libpng12(如我要使用的Cisco Packet Tracer).我们可以采 ...

  9. 探索Java9 模块系统和反应流

    Java9 新特性 ,Java 模块化,Java 反应流 Reactive,Jigsaw 模块系统 Java平台模块系统(JPMS)是Java9中的特性,它是Jigsaw项目的产物.简而言之,它以更简 ...

  10. RabbitMQ通过Exchange.fanout、不同的队列绑定同一个Exchange实现多播处理

    消费者1: static void Main(string[] args) { ConnectionFactory factory = new ConnectionFactory() { HostNa ...