Multi-GPU processing with data parallelism

If you write your software in a language like C++ for a single cpu core, making it run on multiple GPUs in parallel would require rewriting the software from scratch. But this is not the case with TensorFlow. Because of its symbolic nature, tensorflow can hide all that complexity, making it effortless to scale your program across many CPUs and GPUs.

Let’s start with the simple example of adding two vectors on CPU:

 import tensorflow as tf

with tf.device(tf.DeviceSpec(device_type='CPU', device_index=0)):
a = tf.random_uniform([1000, 100])
b = tf.random_uniform([1000, 100])
c = a + b tf.Session().run(c)

The same thing can as simply be done on GPU:

with tf.device(tf.DeviceSpec(device_type='GPU', device_index=0)):
a = tf.random_uniform([1000, 100])
b = tf.random_uniform([1000, 100])
c = a + b
``` But what if we have two GPUs and want to utilize both? To do that, we can split the data and use a separate GPU for processing each half:
```python
split_a = tf.split(a, 2)
split_b = tf.split(b, 2) split_c = []
for i in range(2):
with tf.device(tf.DeviceSpec(device_type='GPU', device_index=i)):
split_c.append(split_a[i] + split_b[i]) c = tf.concat(split_c, axis=0)
``` Let's rewrite this in a more general form so that we can replace addition with any other set of operations: <div class="se-preview-section-delimiter"></div> ```python
def make_parallel(fn, num_gpus, **kwargs):
in_splits = {}
for k, v in kwargs.items():
in_splits[k] = tf.split(v, num_gpus) out_split = []
for i in range(num_gpus):
with tf.device(tf.DeviceSpec(device_type='GPU', device_index=i)):
with tf.variable_scope(tf.get_variable_scope(), reuse=i > 0):
out_split.append(fn(**{k : v[i] for k, v in in_splits.items()})) return tf.concat(out_split, axis=0) def model(a, b):
return a + b c = make_parallel(model, 2, a=a, b=b)

You can replace the model with any function that takes a set of tensors as input and returns a tensor as result with the condition that both the input and output are in batch. Note that we also added a variable scope and set the reuse to true. This makes sure that we use the same variables for processing both splits. This is something that will become handy in our next example.

Let’s look at a slightly more practical example. We want to train a neural network on multiple GPUs. During training we not only need to compute the forward pass but also need to compute the backward pass (the gradients). But how can we parallelize the gradient computation? This turns out to be pretty easy.

Recall from the first item that we wanted to fit a second degree polynomial to a set of samples. We reorganized the code a bit to have the bulk of the operations in the model function:

import numpy as np
import tensorflow as tf def model(x, y):
w = tf.get_variable("w", shape=[3, 1]) f = tf.stack([tf.square(x), x, tf.ones_like(x)], 1)
yhat = tf.squeeze(tf.matmul(f, w), 1) loss = tf.square(yhat - y)
return loss x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32) loss = model(x, y) train_op = tf.train.AdamOptimizer(0.1).minimize(
tf.reduce_mean(loss)) def generate_data():
x_val = np.random.uniform(-10.0, 10.0, size=100)
y_val = 5 * np.square(x_val) + 3
return x_val, y_val sess = tf.Session()
sess.run(tf.global_variables_initializer())
for _ in range(1000):
x_val, y_val = generate_data()
_, loss_val = sess.run([train_op, loss], {x: x_val, y: y_val}) _, loss_val = sess.run([train_op, loss], {x: x_val, y: y_val})
print(sess.run(tf.contrib.framework.get_variables_by_name("w")))

Now let’s use make_parallel that we just wrote to parallelize this. We only need to change two lines of code from the above code:

loss = make_parallel(model, 2, x=x, y=y)

train_op = tf.train.AdamOptimizer(0.1).minimize(
tf.reduce_mean(loss),
colocate_gradients_with_ops=True)

The only thing that we need to change to parallelize backpropagation of gradients is to set the colocate_gradients_with_ops flag to true. This ensures that gradient ops run on the same device as the original op.

更多教程:http://www.tensorflownews.com/

TensorFlow 多 GPU 处理并行数据的更多相关文章

  1. Setup Tensorflow with GPU on Mac OSX 10.11

    Setup Tensorflow with GPU on OSX 10.11 环境描述 电脑:MacBook Pro 15.6 CPU: 2.7GHz 显卡: GT 650m 系统:OSX 10.11 ...

  2. linux 安装tensorflow(gpu版本)

    一.安装cuda 具体安装过程见我的另一篇博客,ubuntu16.04下安装配置深度学习环境 二.安装tensorflow 1.具体安装过程官网其实写的比较详细,总结一下的话可以分为两种:安装rele ...

  3. Tensorflow检验GPU是否安装成功 及 使用GPU训练注意事项

    1. 已经安装cuda但是tensorflow仍然使用cpu加速的问题 电脑上同时安装了GPU和CPU版本的TensorFlow,本来想用下面代码测试一下GPU程序,但无奈老是没有调用GPU. imp ...

  4. Ubuntu16.04下安装tensorflow(GPU加速)【转】

    本文转载自:https://blog.csdn.net/qq_30520759/article/details/78947034 版权声明:本文为博主原创文章,未经博主允许不得转载. https:// ...

  5. tensorflow 安装GPU版本,个人总结,步骤比较详细【转】

    本文转载自:https://blog.csdn.net/gangeqian2/article/details/79358543 手把手教你windows安装tensorflow的教程参考另一篇博文ht ...

  6. Google TensorFlow for GPU安装、配置大坑

    Google TensorFlow for GPU安装.配置大坑 从本周一开始(12.05),共4天半的时间,终于折腾好Google TensorFlow for GPU版本,其间跳坑无数,摔得遍体鳞 ...

  7. Win10 TensorFlow(gpu)安装详解

    Win10 TensorFlow(gpu)安装详解 写在前面:TensorFlow是谷歌基于DistBelief进行研发的第二代人工智能学习系统,其命名来源于本身的运行原理.Tensor(张量)意味着 ...

  8. [开发技巧]·TensorFlow&Keras GPU使用技巧

    [开发技巧]·TensorFlow&Keras GPU使用技巧 ​ 1.问题描述 在使用TensorFlow&Keras通过GPU进行加速训练时,有时在训练一个任务的时候需要去测试结果 ...

  9. tensorflow with gpu 环境配置

    1.准备工作 1.1 确保GPU驱动已经安装 lspci | grep -i nvidia 通过此命令可以查看GPU信息,测试机已经安装GPU驱动

随机推荐

  1. JavaScript 执行环境以及作用域链

    执行环境(execution context,为简单起见,有时也称为"环境")是 JavaScript 中最为重要的一个概念.执行环境定义了变量或函数有权访问的其他数据,决定了它们 ...

  2. 小程序在ios10.2系统上兼容

    1.  定位元素在ios10.2系统上出现样式问题??? 没错,就是在测试在侧道ios10.2系统时发现了样式错误的问题,比如一个Swiper中,最后一个展示有问题. 这是啥原因❓❓❓❓❓❓ 大写的问 ...

  3. Python几个简单实用的模块

    今天整理了下,工作中常用的一些高阶函数,后面持续更新...... 一.collections 二.itertools 三.functools

  4. 大厂面试题:集群部署时的分布式 session 如何实现?

    面试官心理分析 面试官问了你一堆 dubbo 是怎么玩儿的,你会玩儿 dubbo 就可以把单块系统弄成分布式系统,然后分布式之后接踵而来的就是一堆问题,最大的问题就是分布式事务.接口幂等性.分布式锁, ...

  5. http面试问题集锦

    1.http的请求报文和响应报文? http请求报文:请求行(请求方法+url).请求头,请求体 http响应报文:状态行(http版本+状态码).响应头.响应体   2.常用的http请求类型? 请 ...

  6. py面向对象编程基础

    '''类:一类事物的抽象,用于定义抽象类型 实例:类的单个实际描述 如:人是一个类,而单个人是一个实例 用class来创建一个类,调用一个类来创建一个实例'''class Person: passxi ...

  7. 【jQuery学习日记】jQuery实现滚动动画

    需要实现的效果 样式分析: 2个主要部分,头部的标题和导航部分,和主要的功能实现区域: 1.头部 <div id="header"> <h1>动漫视频< ...

  8. 关于毕业五年PHP成长疑惑

    1.PHP语法基础是否都会,比如异常捕捉,面向对象,数组操作语法,字符串操作,cookie,session,全局变量,超全局数组,防止sql注入,mysql预处理 2.MYSQL基础语法,字段设计,原 ...

  9. 多图文,详细介绍mysql各个集群方案

    目录 多图文,详细介绍mysql各个集群方案 一,mysql原厂出品 二,mysql第三方优化 三,依托硬件配合 四,其它 多图文,详细介绍mysql各个集群方案 集群的好处 高可用性:故障检测及迁移 ...

  10. springboot连接redis错误 io.lettuce.core.RedisCommandTimeoutException:

    springboot连接redis报错 超时连接不上  可以从以下方面排查 1查看自己的配置文件信息,把超时时间不要设置0毫秒 设置5000毫秒 2redis服务长时间不连接就会休眠,也会连接不上 重 ...