tensorflow节点布放（device assignment of node）算法：simpler

tensorflow v0.9中目前在用的devcie assignment算法是simple placer算法，相比于白皮书中cost model算法实现简单。simpler placer算法优先选择/gpu:0设备，但不支持 multi gpu assignment。

白皮书提到的cost model可以根据设备资源代价、数据传输代价平衡分配设备，在v0.9版本中有部分实现，但还未开放使用，见 core/graph/costmodel.cc

simple_placer的实现代码在文件python/core/common_runtime/simple_placer.cc，其中包含device_assignment的核心功能。

core/common_runtime/simple_placer_test.cc测试片段如下

 ////////////////////////////////////////////////////////////////////////////////

 //

 // A SimplePlacerTest method has three phases:

 //

 // 1. Build a TensorFlow graph, with no (or partial) device assignments.

 // 2. Attempt to compute a placement using the SimplePlacer.

 // 3. EITHER: test that the constraints implied by the graph are respected;

 //    or that an appropriate error was reported.

 //

 ////////////////////////////////////////////////////////////////////////////////

 class SimplePlacerTest : public ::testing::Test {

  protected:

   SimplePlacerTest() {

     // Build a set of 10 GPU and 10 CPU devices.

     // NOTE: this->local_devices_ owns the device objects;

     // this->devices_ contains borrowed pointers to the device

     // objects.

     for (int i = ; i < ; ++i) {    // 添加了10 cpu和10 gpu的fake devices

       local_devices_.emplace_back(FakeDevice::MakeCPU(

           strings::StrCat("/job:a/replica:0/task:0/cpu:", i)));

       devices_.AddDevice(local_devices_.back().get());

       // Insert the GPUs in reverse order.

       local_devices_.emplace_back(FakeDevice::MakeGPU(

           strings::StrCat("/job:a/replica:0/task:0/gpu:",  - i)));

       devices_.AddDevice(local_devices_.back().get());

     }

   }

   ...

 }

 ...

 // Test that a graph with no constraints will successfully assign nodes to the

 // "best available" device (i.e. prefer GPU over CPU).

 TEST_F(SimplePlacerTest, TestNoConstraints) {

   Graph g(OpRegistry::Global());

   {  // Scope for temporary variables used to construct g.   // 用GraphDefBuilder构建graph的结构

     GraphDefBuilder b(GraphDefBuilder::kFailImmediately);

     Node* input = ops::SourceOp("TestInput", b.opts().WithName("in"));

     ops::UnaryOp("TestRelu", ops::NodeOut(input, ), b.opts().WithName("n1"));

     ops::UnaryOp("TestRelu", ops::NodeOut(input, ), b.opts().WithName("n2"));

     TF_EXPECT_OK(BuildGraph(b, &g));   //  BuildGraph函数将GraphDefBuilder的图写入到Graph中

   }

   TF_EXPECT_OK(Place(&g));   // Place函数将graph中的node布放到设备列表中

   EXPECT_DEVICE_TYPE(g, "in", DEVICE_CPU);   // 期望：input节点在CPU中，n1节点在GPU中，n2节点在GPU中，故而GPU优先级大于CPU

   EXPECT_DEVICE_TYPE(g, "n1", DEVICE_GPU);

   EXPECT_DEVICE_TYPE(g, "n2", DEVICE_GPU);

 }

其中BuildGraph函数将GraphDefBuilder 对象中的graph 结构定义写入到Graph中。Place函数将graph中的node布放到设备列表中，其中device assignment算法的核心在SimplePlacer::Run函数中

  // Builds the given graph, and (if successful) indexes the node

   // names for use in placement, and later lookup.

   Status BuildGraph(const GraphDefBuilder& builder, Graph* out_graph) {

     TF_RETURN_IF_ERROR(builder.ToGraph(out_graph));

     nodes_by_name_.clear();

     for (Node* node : out_graph->nodes()) {

       nodes_by_name_[node->name()] = node->id();

     }

     return Status::OK();

   }

   // Invokes the SimplePlacer on "graph". If no DeviceSet is specified, the

   // placement will use the default DeviceSet (of 10 CPU and 10 GPU devices).

   //

   // REQUIRES: "*graph" was produced by the most recent call to BuildGraph.

   Status Place(Graph* graph, DeviceSet* devices, SessionOptions* options) {

     SimplePlacer placer(graph, devices, options);

     return placer.Run();

   }

SimplePlacer::Run()在core/common_runtime/simple_placer.cc文件中，具体实现分为4个步骤：

步骤1和2：遍历graph的node，将node加入到ColocationGraph对象中（不包含source和sink节点）。

 // 1. First add all of the nodes. Note that steps (1) and (2)

 // requires two passes over the nodes because the graph (and hence

 // the constraints) may not be acyclic.  这里graph可能是有环的？

 for (Node* node : graph_->nodes()) {

     // Skip the source and sink nodes.

     if (!node->IsOp()) { continue; }

     status = colocation_graph.AddNode(*node);

     if (!status.ok()) return AttachDef(status, node->def());

   }

 // 2. Enumerate the constraint edges, and use them to update the disjoint node set.         // disjoint set（并查集，即不相交的节点集合），一种树型数据结构，

 ...

 ColocationGraph maintains the connected components of a colocation constraint graph, and uses this information to assign a satisfying device placement to the nodes of the graph.

 The implementation uses the union- find algorithm to maintain the connected components efficiently and incrementally as edges (implied by ColocationGraph::ColocateNodes() invocations) are added.

 参考：并查集wiki

步骤3：如下图和code所示，source和sink节点分配在cpu上，已指定device的节点不再重新分配。分配方式有方面，见Heuristic A和Heuristic B。

  . For each node, assign a device based on the constraints in thedisjoint node set.

   std::vector<Device*> devices;

   std::vector<Node*> second_pass;

   for (Node* node : graph_->nodes()) {

     // Skip the source and sink nodes.

     if (!node->IsOp()) {

       continue;

     }

     // Skip nodes that already have an assigned name.

     if (!node->assigned_device_name().empty()) {

       continue;

     }

     // Heuristic A: prefer to place "generators" with their only

     // consumers.

     //

     // If this is a node with no inputs and a single (non-ref)

     // consumer, we save this for a second pass, so that the

     // consumer's placement is chosen.

     if (IsGeneratorNode(node)) {    // generator node: no input, one output, not a reference-type node

       second_pass.push_back(node);

       continue;

     }

     status = colocation_graph.GetDevicesForNode(node, &devices);

     ...

     // Returns the first device in sorted devices list so we will always

     // choose the same device.

     //

     // TODO(vrv): Factor this assignment out into a pluggable

     // algorithm, so that SimplePlacer is responsible for enforcing

     // preconditions and we can experiment with other algorithms when

     // given a choice of devices. Once we have a better idea of the

     // types of heuristics we want to use and the information needed

     // to perform good placement we can add an interface for this.

     string assigned_device = devices[]->name();

     // Heuristic B: If the node only operates on metadata, not data,

     // then it is desirable to place that metadata node with its

     // input.

     if (IsMetadataNode(node)) {

       // Make sure that the input device type is in the list of supported

       // device types for this node.

       const Node* input = (*node->in_edges().begin())->src();

       // TODO(vrv): if the input is empty, consider postponing this

       // node's assignment to the second pass, so that we handle the

       // case where a metadata node's input comes from a backedge

       // of a loop.

       const string& input_device_name = input->assigned_device_name();

       if (CanAssignToDevice(input_device_name, devices)) {

         assigned_device = input_device_name;

       }

     }

     AssignAndLog(assigned_device, node);   // 将assigned_device分配个node节点，在步骤3中没有对符合Heuristic A的GeneratorNode分配设备，而是在步骤4中完成的

   }

 bool IsGeneratorNode(const Node* node) {

   return node->num_inputs() ==  && node->num_outputs() ==  && node->out_edges().size() ==  && !IsRefType(node->output_type());

 }

 bool IsMetadataNode(const Node* node) {

   const string& node_type = node->type_string();

   return (node_type == "Size" || node_type == "Shape" || node_type == "Rank");

 }

步骤4：给步骤3中的Generator Node分配device。

// 4. Perform a second pass assignment for those nodes explicitly skipped during the first pass.

...

部分参考：

http://bettercstomorrow.com/2016/07/14/distributed-tensorflow-internal-architecture-summary/

http://bettercstomorrow.com/2016/07/06/distributed-tensorflow-internal-architecture-6/ （韩文的-_-）

”tensorflow: large-scale machine learning on heterogeneous distributed systems“

来自为知笔记(Wiz)

tensorflow节点布放（device assignment of node）算法：simpler_placer的更多相关文章

获取所有树叶子节点注册添加事件 if ($(node).tree('isLeaf', node.target)) 是否叶子节点
//获取所有树叶子节点注册添加事件 if ($(node).tree('isLeaf', node.target)) 是否叶子节点 $(function () { $('.easyui-tree') ...
[图解tensorflow源码] Simple Placer节点布放算法
笔记︱基于网络节点的node2vec、论文、算法python实现
看到一个很有意思的算法,而且腾讯朋友圈lookalike一文中也有提及到,于是蹭一波热点,学习一下.论文是也发KDD2016 . . 一.主要论文:node2vec: Scalable Feature ...
TensorFlow实现knn（k近邻）算法
首先先介绍一下knn的基本原理: KNN是通过计算不同特征值之间的距离进行分类. 整体的思路是:如果一个样本在特征空间中的k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别,则该样本也属于 ...
HDU 5289 Assignment （ST算法区间最值+二分）
题目链接:pid=5289">http://acm.hdu.edu.cn/showproblem.php?pid=5289 题面: Assignment Time Limit: 400 ...
kaggle赛题Digit Recognizer：利用TensorFlow搭建神经网络（附上K邻近算法模型预测）
一.前言 kaggle上有传统的手写数字识别mnist的赛题,通过分类算法,将图片数据进行识别.mnist数据集里面,包含了42000张手写数字0到9的图片,每张图片为28*28=784的像素,所以整 ...
Kubernetes 二进制部署（一）单节点部署（Master 与 Node 同一机器）
0. 前言最近受“新冠肺炎”疫情影响,在家等着,入职暂时延后,在家里办公和学习尝试通过源码编译二进制的方式在单一节点(Master 与 Node 部署在同一个机器上)上部署一个 k8s 环境,整理 ...
k8s kubernetes给node节点添加标签和删除node节点标签
node节点IP 192.168.1.205 给节点添加标签的命令添加label语法 kubectl label nodes <node-name> <label-key>= ...
TensorFlow从0到1之回归算法（11）
回归是数学建模.分类和预测中最古老但功能非常强大的工具之一.回归在工程.物理学.生物学.金融.社会科学等各个领域都有应用,是数据科学家常用的基本工具. 回归通常是机器学习中使用的第一个算法.通过学习因 ...

随机推荐

AtomicInteger关键字
validate 关键字可以保证多线程之间的可见性,但是不能保证原子操作.(需要了解java内存模型jmm) package com.cn.test.thread; public class Vola ...
POI 2000 ------Stripes
Stripes Time Limit:1000MS Memory Limit:30000KBTotal Submit:94 Accepted:43 Description Stripes is a t ...
jvm内置锁synchronized不能被中断
很久没看技术书籍了,今天看了一下<七周七并发模型>前面两章讲的java,写的还是有深度的.看到了一个有demo,说jvm内置锁synchronized是不能被中断的.照着书上写了个demo ...
SQL SERVER 2012修改数据库名称（包括 db.mdf 名称的修改）
假设原来数据库名为db,附加数据库为db.mdf和db_log.ldf.需要改成dbt,及dbt.mdf和dbt_log.ldf. 步骤: .首先把原来的数据库进行备份(选择数据库->右键-&g ...
c语言结构体可以直接赋值
结构体直接赋值的实现下面是一个实例: #include <stdio.h> struct Foo { char a; int b; double c; }foo1, foo2; //de ...
spring下应用@Resource, @Autowired 和 @Inject注解进行依赖注入的差异
为了探寻 '@Resource', '@Autowired', 和'@Inject'如何解决依赖注入中的问题,我创建了一个"Party"接口,和它的两个实现类"Perso ...
JS中forEach的用法
forEach是ES5中操作数组的一种方法,主要功能是遍历数组,例如: 1 2 var arr = [1,2,3,4]; arr.forEach(alert); 等价于: 1 2 3 4 var ar ...
Microsoft Windows Scripting Self-Paced Learning Guide
http://www.mums.ac.ir/shares/hit/eduhit/book/windowsscripting.pdfhttp://support.microsoft.com/kb/926 ...
文字编辑器FCKeditor 简介以及基本配置和使用方法
什么是FCKeditor FCKeditor是一个专门使用在网页上属于开放源代码的所见即所得文字编辑器.它志于轻量化,不需要太复杂的安装步骤即可使用.它可和PHP.JavaScript.ASP.ASP ...
ES6入门——let和const命令
let和const命令 1.let命令用法:类似于var,用来声明一个变量,区别是所声明的变量只在let命令所在的代码块内有效. let命令很适合用在for循环的计数器中,因为let声明的变量仅在作 ...

tensorflow节点布放（device assignment of node）算法：simpler_placer

tensorflow节点布放（device assignment of node）算法：simpler_placer的更多相关文章

随机推荐

热门专题