tensorflow v0.9中目前在用的devcie assignment算法是simple placer算法,相比于白皮书中cost model算法实现简单。simpler placer算法优先选择/gpu:0设备, 但不支持 multi gpu assignment。
白皮书提到的cost model可以根据设备资源代价、数据传输代价平衡分配设备,在v0.9版本中有部分实现,但还未开放使用,见 core/graph/costmodel.cc 
 
simple_placer的实现代码在文件python/core/common_runtime/simple_placer.cc,其中包含device_assignment的核心功能。

core/common_runtime/simple_placer_test.cc测试片段如下

 ////////////////////////////////////////////////////////////////////////////////
//
// A SimplePlacerTest method has three phases:
//
// 1. Build a TensorFlow graph, with no (or partial) device assignments.
// 2. Attempt to compute a placement using the SimplePlacer.
// 3. EITHER: test that the constraints implied by the graph are respected;
// or that an appropriate error was reported.
//
////////////////////////////////////////////////////////////////////////////////
class SimplePlacerTest : public ::testing::Test {
protected:
SimplePlacerTest() {
// Build a set of 10 GPU and 10 CPU devices.
// NOTE: this->local_devices_ owns the device objects;
// this->devices_ contains borrowed pointers to the device
// objects.
for (int i = ; i < ; ++i) { // 添加了10 cpu和10 gpu的fake devices
local_devices_.emplace_back(FakeDevice::MakeCPU(
strings::StrCat("/job:a/replica:0/task:0/cpu:", i)));
devices_.AddDevice(local_devices_.back().get());
// Insert the GPUs in reverse order.
local_devices_.emplace_back(FakeDevice::MakeGPU(
strings::StrCat("/job:a/replica:0/task:0/gpu:", - i)));
devices_.AddDevice(local_devices_.back().get());
}
}
...
}
...
// Test that a graph with no constraints will successfully assign nodes to the
// "best available" device (i.e. prefer GPU over CPU).
TEST_F(SimplePlacerTest, TestNoConstraints) {
Graph g(OpRegistry::Global());
{ // Scope for temporary variables used to construct g. // 用GraphDefBuilder构建graph的结构
GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
Node* input = ops::SourceOp("TestInput", b.opts().WithName("in"));
ops::UnaryOp("TestRelu", ops::NodeOut(input, ), b.opts().WithName("n1"));
ops::UnaryOp("TestRelu", ops::NodeOut(input, ), b.opts().WithName("n2"));
TF_EXPECT_OK(BuildGraph(b, &g)); // BuildGraph函数将GraphDefBuilder的图写入到Graph中
} TF_EXPECT_OK(Place(&g)); // Place函数将graph中的node布放到设备列表中
EXPECT_DEVICE_TYPE(g, "in", DEVICE_CPU); // 期望:input节点在CPU中,n1节点在GPU中,n2节点在GPU中,故而GPU优先级大于CPU
EXPECT_DEVICE_TYPE(g, "n1", DEVICE_GPU);
EXPECT_DEVICE_TYPE(g, "n2", DEVICE_GPU);
}

其中BuildGraph函数将GraphDefBuilder 对象中的graph 结构定义写入到Graph中。Place函数将graph中的node布放到设备列表中,其中device assignment算法的核心在SimplePlacer::Run函数中

  // Builds the given graph, and (if successful) indexes the node
// names for use in placement, and later lookup.
Status BuildGraph(const GraphDefBuilder& builder, Graph* out_graph) {
TF_RETURN_IF_ERROR(builder.ToGraph(out_graph));
nodes_by_name_.clear();
for (Node* node : out_graph->nodes()) {
nodes_by_name_[node->name()] = node->id();
}
return Status::OK();
}
// Invokes the SimplePlacer on "graph". If no DeviceSet is specified, the
// placement will use the default DeviceSet (of 10 CPU and 10 GPU devices).
//
// REQUIRES: "*graph" was produced by the most recent call to BuildGraph.
Status Place(Graph* graph, DeviceSet* devices, SessionOptions* options) {
SimplePlacer placer(graph, devices, options);
return placer.Run();
}

SimplePlacer::Run()在core/common_runtime/simple_placer.cc文件中,具体实现分为4个步骤:

步骤1和2: 遍历graph的node,将node加入到ColocationGraph对象中(不包含source和sink节点)。
 // 1. First add all of the nodes. Note that steps (1) and (2)
// requires two passes over the nodes because the graph (and hence
// the constraints) may not be acyclic. 这里graph可能是有环的?
for (Node* node : graph_->nodes()) {
// Skip the source and sink nodes.
if (!node->IsOp()) { continue; }
status = colocation_graph.AddNode(*node);
if (!status.ok()) return AttachDef(status, node->def());
}
// 2. Enumerate the constraint edges, and use them to update the disjoint node set. // disjoint set(并查集,即不相交的节点集合),一种树型数据结构,
...
 ColocationGraph maintains the connected components of a colocation constraint graph, and uses this information to assign a satisfying device placement to the nodes of the graph.
The implementation uses the union- find algorithm to maintain the connected components efficiently and incrementally as edges (implied by ColocationGraph::ColocateNodes() invocations) are added.
参考:并查集wiki
 
步骤3:如下图和code所示,source和sink节点分配在cpu上,已指定device的节点不再重新分配。分配方式有方面,见Heuristic A和Heuristic B。
  . For each node, assign a device based on the constraints in thedisjoint node set.
std::vector<Device*> devices;
std::vector<Node*> second_pass;
for (Node* node : graph_->nodes()) {
// Skip the source and sink nodes.
if (!node->IsOp()) {
continue;
}
// Skip nodes that already have an assigned name.
if (!node->assigned_device_name().empty()) {
continue;
}
// Heuristic A: prefer to place "generators" with their only
// consumers.
//
// If this is a node with no inputs and a single (non-ref)
// consumer, we save this for a second pass, so that the
// consumer's placement is chosen.
if (IsGeneratorNode(node)) { // generator node: no input, one output, not a reference-type node
second_pass.push_back(node);
continue;
}
status = colocation_graph.GetDevicesForNode(node, &devices);
...
// Returns the first device in sorted devices list so we will always
// choose the same device.
//
// TODO(vrv): Factor this assignment out into a pluggable
// algorithm, so that SimplePlacer is responsible for enforcing
// preconditions and we can experiment with other algorithms when
// given a choice of devices. Once we have a better idea of the
// types of heuristics we want to use and the information needed
// to perform good placement we can add an interface for this.
string assigned_device = devices[]->name();
// Heuristic B: If the node only operates on metadata, not data,
// then it is desirable to place that metadata node with its
// input.
if (IsMetadataNode(node)) {
// Make sure that the input device type is in the list of supported
// device types for this node.
const Node* input = (*node->in_edges().begin())->src();
// TODO(vrv): if the input is empty, consider postponing this
// node's assignment to the second pass, so that we handle the
// case where a metadata node's input comes from a backedge
// of a loop.
const string& input_device_name = input->assigned_device_name();
if (CanAssignToDevice(input_device_name, devices)) {
assigned_device = input_device_name;
}
}
AssignAndLog(assigned_device, node); // 将assigned_device分配个node节点,在步骤3中没有对符合Heuristic A的GeneratorNode分配设备,而是在步骤4中完成的
}
 bool IsGeneratorNode(const Node* node) {
return node->num_inputs() == && node->num_outputs() == && node->out_edges().size() == && !IsRefType(node->output_type());
}
 bool IsMetadataNode(const Node* node) {
const string& node_type = node->type_string();
return (node_type == "Size" || node_type == "Shape" || node_type == "Rank");
}
步骤4:给步骤3中的Generator Node分配device。
// 4. Perform a second pass assignment for those nodes explicitly skipped during the first pass.
...

部分参考:

 
 
 
 
 
 
 
 
 
 
 
 

tensorflow节点布放(device assignment of node)算法:simpler_placer的更多相关文章

  1. 获取所有树叶子节点 注册添加事件 if ($(node).tree('isLeaf', node.target)) 是否叶子节点

    //获取所有树叶子节点 注册添加事件 if ($(node).tree('isLeaf', node.target)) 是否叶子节点 $(function () { $('.easyui-tree') ...

  2. [图解tensorflow源码] Simple Placer节点布放算法

  3. 笔记︱基于网络节点的node2vec、论文、算法python实现

    看到一个很有意思的算法,而且腾讯朋友圈lookalike一文中也有提及到,于是蹭一波热点,学习一下.论文是也发KDD2016 . . 一.主要论文:node2vec: Scalable Feature ...

  4. TensorFlow实现knn(k近邻)算法

    首先先介绍一下knn的基本原理: KNN是通过计算不同特征值之间的距离进行分类. 整体的思路是:如果一个样本在特征空间中的k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别,则该样本也属于 ...

  5. HDU 5289 Assignment (ST算法区间最值+二分)

    题目链接:pid=5289">http://acm.hdu.edu.cn/showproblem.php?pid=5289 题面: Assignment Time Limit: 400 ...

  6. kaggle赛题Digit Recognizer:利用TensorFlow搭建神经网络(附上K邻近算法模型预测)

    一.前言 kaggle上有传统的手写数字识别mnist的赛题,通过分类算法,将图片数据进行识别.mnist数据集里面,包含了42000张手写数字0到9的图片,每张图片为28*28=784的像素,所以整 ...

  7. Kubernetes 二进制部署(一)单节点部署(Master 与 Node 同一机器)

    0. 前言 最近受“新冠肺炎”疫情影响,在家等着,入职暂时延后,在家里办公和学习 尝试通过源码编译二进制的方式在单一节点(Master 与 Node 部署在同一个机器上)上部署一个 k8s 环境,整理 ...

  8. k8s kubernetes给node节点添加标签和删除node节点标签

    node节点IP 192.168.1.205 给节点添加标签的命令 添加label语法 kubectl label nodes <node-name> <label-key>= ...

  9. TensorFlow从0到1之回归算法(11)

    回归是数学建模.分类和预测中最古老但功能非常强大的工具之一.回归在工程.物理学.生物学.金融.社会科学等各个领域都有应用,是数据科学家常用的基本工具. 回归通常是机器学习中使用的第一个算法.通过学习因 ...

随机推荐

  1. JavaScript获取当前站点的域名和端口号

    获取域名(第一种方法): document.hostname 获取域名(第二种方法): document.domin 获取端口号: location.port 获取主机+端口号: location.h ...

  2. 记一次使用cmd执行java文件遇到的坑...包括“使用java命令运行class文件提示“错误:找不到或无法加载主类“的问题”

    今天写了一个java文件,类似聊天软件的东西.在eclipse里输入输出显得没感觉,于是乎就准备在cmd里输入和显示输出.如下图,我准备运行的是ChatDemo.class文件.路径是:D:\work ...

  3. Heka 最简单例子

    技术人员学习都是从简单例子开始的, Heka的应用也是从简单开始的.   需求: 监控一个日志文件的内容, 在标准输出显示出来.   操作步骤: 使用下载好或者编译好的 heka 已经编译好的 rel ...

  4. Web前端面试指导(二十):JavaScript中如何翻转一个字符串?

    题目点评 字符串作在程序中是非常常见的,因为程序中绝大部分的数据都可以当作字符串来处理.需要对字符的处理方法比较熟悉,在回答的时候尽量能够说出多种解决方法更好! 字符串翻转的方法 1)使用字符串函数 ...

  5. google搜索使用技巧

    1.输入框所有空格都被理解为加号2.搜索多个单词时,需要加上引号,会当字符串处理3.使用-(减号)剔除指定条件,如:'mongdb'-'nodejs'4.可以使用通配符,如'vue *'5.在指定网站 ...

  6. jQuery框架学习第十一天:实战jQuery表单验证及jQuery自动完成提示插件

    jQuery框架学习第一天:开始认识jQueryjQuery框架学习第二天:jQuery中万能的选择器jQuery框架学习第三天:如何管理jQuery包装集 jQuery框架学习第四天:使用jQuer ...

  7. 【One Day菜鸟到大鸟】MyBatis搭建环境

    一.概述     MyBatis是一个持久化框架和Hiberante差不多.比起JDBC来说MyBatis封装了JDBC让我们能够面向对象开发.比起Hiberante来说卸下了Hiberante那种重 ...

  8. sharePoint查看与更改用户登录账号

    PS D:\deployScript> $user=(Get-SPUser -IDENTITY "i:0e.t|xmssts|zhangshan" -Web http://t ...

  9. Android通过浏览器打开app页面并且传递值

    最近公司有个需求,要求从第三方网页端打开一个网页,然后在网页中点击“下载”,“打开”按钮,在app端进行下载和打开操作.这里记录下方法. 首先,网页和app页面进行交互,其实会很快想到JS交互,但是现 ...

  10. winform 写入和读取TXT文件

    C# winform写入和读取TXT文件 string str; str=this.textBox1.Text; StreamWriter sw = new StreamWriter(Applicat ...