Multi-GPU processing with data parallelism

If you write your software in a language like C++ for a single cpu core, making it run on multiple GPUs in parallel would require rewriting the software from scratch. But this is not the case with TensorFlow. Because of its symbolic nature, tensorflow can hide all that complexity, making it effortless to scale your program across many CPUs and GPUs.

Let’s start with the simple example of adding two vectors on CPU:

 import tensorflow as tf

with tf.device(tf.DeviceSpec(device_type='CPU', device_index=0)):
a = tf.random_uniform([1000, 100])
b = tf.random_uniform([1000, 100])
c = a + b tf.Session().run(c)

The same thing can as simply be done on GPU:

with tf.device(tf.DeviceSpec(device_type='GPU', device_index=0)):
a = tf.random_uniform([1000, 100])
b = tf.random_uniform([1000, 100])
c = a + b
``` But what if we have two GPUs and want to utilize both? To do that, we can split the data and use a separate GPU for processing each half:
```python
split_a = tf.split(a, 2)
split_b = tf.split(b, 2) split_c = []
for i in range(2):
with tf.device(tf.DeviceSpec(device_type='GPU', device_index=i)):
split_c.append(split_a[i] + split_b[i]) c = tf.concat(split_c, axis=0)
``` Let's rewrite this in a more general form so that we can replace addition with any other set of operations: <div class="se-preview-section-delimiter"></div> ```python
def make_parallel(fn, num_gpus, **kwargs):
in_splits = {}
for k, v in kwargs.items():
in_splits[k] = tf.split(v, num_gpus) out_split = []
for i in range(num_gpus):
with tf.device(tf.DeviceSpec(device_type='GPU', device_index=i)):
with tf.variable_scope(tf.get_variable_scope(), reuse=i > 0):
out_split.append(fn(**{k : v[i] for k, v in in_splits.items()})) return tf.concat(out_split, axis=0) def model(a, b):
return a + b c = make_parallel(model, 2, a=a, b=b)

You can replace the model with any function that takes a set of tensors as input and returns a tensor as result with the condition that both the input and output are in batch. Note that we also added a variable scope and set the reuse to true. This makes sure that we use the same variables for processing both splits. This is something that will become handy in our next example.

Let’s look at a slightly more practical example. We want to train a neural network on multiple GPUs. During training we not only need to compute the forward pass but also need to compute the backward pass (the gradients). But how can we parallelize the gradient computation? This turns out to be pretty easy.

Recall from the first item that we wanted to fit a second degree polynomial to a set of samples. We reorganized the code a bit to have the bulk of the operations in the model function:

import numpy as np
import tensorflow as tf def model(x, y):
w = tf.get_variable("w", shape=[3, 1]) f = tf.stack([tf.square(x), x, tf.ones_like(x)], 1)
yhat = tf.squeeze(tf.matmul(f, w), 1) loss = tf.square(yhat - y)
return loss x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32) loss = model(x, y) train_op = tf.train.AdamOptimizer(0.1).minimize(
tf.reduce_mean(loss)) def generate_data():
x_val = np.random.uniform(-10.0, 10.0, size=100)
y_val = 5 * np.square(x_val) + 3
return x_val, y_val sess = tf.Session()
sess.run(tf.global_variables_initializer())
for _ in range(1000):
x_val, y_val = generate_data()
_, loss_val = sess.run([train_op, loss], {x: x_val, y: y_val}) _, loss_val = sess.run([train_op, loss], {x: x_val, y: y_val})
print(sess.run(tf.contrib.framework.get_variables_by_name("w")))

Now let’s use make_parallel that we just wrote to parallelize this. We only need to change two lines of code from the above code:

loss = make_parallel(model, 2, x=x, y=y)

train_op = tf.train.AdamOptimizer(0.1).minimize(
tf.reduce_mean(loss),
colocate_gradients_with_ops=True)

The only thing that we need to change to parallelize backpropagation of gradients is to set the colocate_gradients_with_ops flag to true. This ensures that gradient ops run on the same device as the original op.

更多教程:http://www.tensorflownews.com/

TensorFlow 多 GPU 处理并行数据的更多相关文章

  1. Setup Tensorflow with GPU on Mac OSX 10.11

    Setup Tensorflow with GPU on OSX 10.11 环境描述 电脑:MacBook Pro 15.6 CPU: 2.7GHz 显卡: GT 650m 系统:OSX 10.11 ...

  2. linux 安装tensorflow(gpu版本)

    一.安装cuda 具体安装过程见我的另一篇博客,ubuntu16.04下安装配置深度学习环境 二.安装tensorflow 1.具体安装过程官网其实写的比较详细,总结一下的话可以分为两种:安装rele ...

  3. Tensorflow检验GPU是否安装成功 及 使用GPU训练注意事项

    1. 已经安装cuda但是tensorflow仍然使用cpu加速的问题 电脑上同时安装了GPU和CPU版本的TensorFlow,本来想用下面代码测试一下GPU程序,但无奈老是没有调用GPU. imp ...

  4. Ubuntu16.04下安装tensorflow(GPU加速)【转】

    本文转载自:https://blog.csdn.net/qq_30520759/article/details/78947034 版权声明:本文为博主原创文章,未经博主允许不得转载. https:// ...

  5. tensorflow 安装GPU版本,个人总结,步骤比较详细【转】

    本文转载自:https://blog.csdn.net/gangeqian2/article/details/79358543 手把手教你windows安装tensorflow的教程参考另一篇博文ht ...

  6. Google TensorFlow for GPU安装、配置大坑

    Google TensorFlow for GPU安装.配置大坑 从本周一开始(12.05),共4天半的时间,终于折腾好Google TensorFlow for GPU版本,其间跳坑无数,摔得遍体鳞 ...

  7. Win10 TensorFlow(gpu)安装详解

    Win10 TensorFlow(gpu)安装详解 写在前面:TensorFlow是谷歌基于DistBelief进行研发的第二代人工智能学习系统,其命名来源于本身的运行原理.Tensor(张量)意味着 ...

  8. [开发技巧]·TensorFlow&Keras GPU使用技巧

    [开发技巧]·TensorFlow&Keras GPU使用技巧 ​ 1.问题描述 在使用TensorFlow&Keras通过GPU进行加速训练时,有时在训练一个任务的时候需要去测试结果 ...

  9. tensorflow with gpu 环境配置

    1.准备工作 1.1 确保GPU驱动已经安装 lspci | grep -i nvidia 通过此命令可以查看GPU信息,测试机已经安装GPU驱动

随机推荐

  1. Java程序监控---Metrics

    概念 Metrics是一个给JAVA服务的各项指标提供度量工具的包,在JAVA代码中嵌入Metrics代码,可以方便的对业务代码的各个指标进行监控 目前最为流行的 metrics 库是来自 Coda ...

  2. 软工 实验一 Git代码版本管理

    实验目的: 1)了解分布式版本控制系统的核心机理: 2)   熟练掌握git的基本指令和分支管理指令: 实验内容: 1)安装git 2)初始配置git ,git init git status指令 3 ...

  3. Zookeeper的核心概念以及java客户端使用

    一.Zookeeper的核心概念 分布式配置中心(存储):disconf(zk).diamond(mysql+http) 1)znode ZooKeeper操作和维护的是一个个数据节点,称为 znod ...

  4. Java 设置Excel数据验证

    数据验证是Excel 2013版本中,数据功能组下面的一个功能,在Excel2013之前的版本,包含Excel2010 Excel2007称为数据有效性.通过在excel表格中设置数据验证可有效规范数 ...

  5. 利用动态资源分配优化Spark应用资源利用率

    背景 在某地市开展项目的时候,发现数据采集,数据探索,预处理,数据统计,训练预测都需要很多资源,现场资源不够用. 目前该项目的资源3台旧的服务器,每台的资源 内存为128G,cores 为24 (co ...

  6. npm项目创建初始过程详解

    npm install 就是安装模块,npm run dev  就是执行npm script中的命令.当我们执行npm命令的时候,它到哪里去找,这就要说到每个node项目中都有的核心文件package ...

  7. Python 解密JWT验证苹果登录

    验证苹果登录,官方提供两种验证方法,一种是token,另一个种是code.这里使用的是token 登录流程: 苹果客户端调用苹果API,获取到用户的信息,包括: user_id 昵称 identity ...

  8. centos7下pymysql安装

    1. 安装 添加mysql yum respository 添加 MySQL Yum Repository 到你的系统 repository 列表中,执行 wget http://repo.mysql ...

  9. 【vue】---- 图片懒加载

    1.作用 在图片较多的页面中,页面加载性能较差.使用图片懒加载可以让图片出现在可视区域时再进行加载,从而提高用户体验. 2.原理 设置img标签的src属性为空或统一的图片路径(如加载中样式),监听页 ...

  10. python学习的新篇章--面向对象

    面向对象的学习笔记   关键要素: 类:class 用来描述具有相同的属性和方法的对象的集合,它定义了该集合中每个对象所共有的属性和方法.   数据成员: 类的不同属性数据   对象: 类的一个实例 ...