TensorFlow 多 GPU 处理并行数据

Multi-GPU processing with data parallelism

If you write your software in a language like C++ for a single cpu core, making it run on multiple GPUs in parallel would require rewriting the software from scratch. But this is not the case with TensorFlow. Because of its symbolic nature, tensorflow can hide all that complexity, making it effortless to scale your program across many CPUs and GPUs.

Let’s start with the simple example of adding two vectors on CPU:

 import tensorflow as tf

with tf.device(tf.DeviceSpec(device_type='CPU', device_index=0)):

    a = tf.random_uniform([1000, 100])

    b = tf.random_uniform([1000, 100])

    c = a + b

tf.Session().run(c)

The same thing can as simply be done on GPU:

with tf.device(tf.DeviceSpec(device_type='GPU', device_index=0)):

    a = tf.random_uniform([1000, 100])

    b = tf.random_uniform([1000, 100])

    c = a + b

 ```

But what if we have two GPUs and want to utilize both? To do that, we can split the data and use a separate GPU for processing each half:

```python

split_a = tf.split(a, 2)

split_b = tf.split(b, 2)

split_c = []

for i in range(2):

    with tf.device(tf.DeviceSpec(device_type='GPU', device_index=i)):

        split_c.append(split_a[i] + split_b[i])

c = tf.concat(split_c, axis=0)

 ```

Let's rewrite this in a more general form so that we can replace addition with any other set of operations:

<div class="se-preview-section-delimiter"></div>

```python

def make_parallel(fn, num_gpus, **kwargs):

    in_splits = {}

    for k, v in kwargs.items():

        in_splits[k] = tf.split(v, num_gpus)

    out_split = []

    for i in range(num_gpus):

        with tf.device(tf.DeviceSpec(device_type='GPU', device_index=i)):

            with tf.variable_scope(tf.get_variable_scope(), reuse=i > 0):

                out_split.append(fn(**{k : v[i] for k, v in in_splits.items()}))

    return tf.concat(out_split, axis=0)

def model(a, b):

    return a + b

c = make_parallel(model, 2, a=a, b=b)

You can replace the model with any function that takes a set of tensors as input and returns a tensor as result with the condition that both the input and output are in batch. Note that we also added a variable scope and set the reuse to true. This makes sure that we use the same variables for processing both splits. This is something that will become handy in our next example.

Let’s look at a slightly more practical example. We want to train a neural network on multiple GPUs. During training we not only need to compute the forward pass but also need to compute the backward pass (the gradients). But how can we parallelize the gradient computation? This turns out to be pretty easy.

Recall from the first item that we wanted to fit a second degree polynomial to a set of samples. We reorganized the code a bit to have the bulk of the operations in the model function:

import numpy as np

import tensorflow as tf

def model(x, y):

    w = tf.get_variable("w", shape=[3, 1])

    f = tf.stack([tf.square(x), x, tf.ones_like(x)], 1)

    yhat = tf.squeeze(tf.matmul(f, w), 1)

    loss = tf.square(yhat - y)

    return loss

x = tf.placeholder(tf.float32)

y = tf.placeholder(tf.float32)

loss = model(x, y)

train_op = tf.train.AdamOptimizer(0.1).minimize(

    tf.reduce_mean(loss))

def generate_data():

    x_val = np.random.uniform(-10.0, 10.0, size=100)

    y_val = 5 * np.square(x_val) + 3

    return x_val, y_val

sess = tf.Session()

sess.run(tf.global_variables_initializer())

for _ in range(1000):

    x_val, y_val = generate_data()

    _, loss_val = sess.run([train_op, loss], {x: x_val, y: y_val})

_, loss_val = sess.run([train_op, loss], {x: x_val, y: y_val})

print(sess.run(tf.contrib.framework.get_variables_by_name("w")))

Now let’s use make_parallel that we just wrote to parallelize this. We only need to change two lines of code from the above code:

loss = make_parallel(model, 2, x=x, y=y)

train_op = tf.train.AdamOptimizer(0.1).minimize(

    tf.reduce_mean(loss),

    colocate_gradients_with_ops=True)

The only thing that we need to change to parallelize backpropagation of gradients is to set the colocate_gradients_with_ops flag to true. This ensures that gradient ops run on the same device as the original op.

更多教程：http://www.tensorflownews.com/

TensorFlow 多 GPU 处理并行数据的更多相关文章

Setup Tensorflow with GPU on Mac OSX 10.11
Setup Tensorflow with GPU on OSX 10.11 环境描述电脑:MacBook Pro 15.6 CPU: 2.7GHz 显卡: GT 650m 系统:OSX 10.11 ...
linux 安装tensorflow（gpu版本）
一.安装cuda 具体安装过程见我的另一篇博客,ubuntu16.04下安装配置深度学习环境二.安装tensorflow 1.具体安装过程官网其实写的比较详细,总结一下的话可以分为两种:安装rele ...
Tensorflow检验GPU是否安装成功及使用GPU训练注意事项
1. 已经安装cuda但是tensorflow仍然使用cpu加速的问题电脑上同时安装了GPU和CPU版本的TensorFlow,本来想用下面代码测试一下GPU程序,但无奈老是没有调用GPU. imp ...
Ubuntu16.04下安装tensorflow（GPU加速）【转】
本文转载自:https://blog.csdn.net/qq_30520759/article/details/78947034 版权声明:本文为博主原创文章,未经博主允许不得转载. https:// ...
tensorflow 安装GPU版本，个人总结，步骤比较详细【转】
本文转载自:https://blog.csdn.net/gangeqian2/article/details/79358543 手把手教你windows安装tensorflow的教程参考另一篇博文ht ...
Google TensorFlow for GPU安装、配置大坑
Google TensorFlow for GPU安装.配置大坑从本周一开始(12.05),共4天半的时间,终于折腾好Google TensorFlow for GPU版本,其间跳坑无数,摔得遍体鳞 ...
Win10 TensorFlow（gpu）安装详解
Win10 TensorFlow(gpu)安装详解写在前面:TensorFlow是谷歌基于DistBelief进行研发的第二代人工智能学习系统,其命名来源于本身的运行原理.Tensor(张量)意味着 ...
[开发技巧]·TensorFlow&Keras GPU使用技巧
[开发技巧]·TensorFlow&Keras GPU使用技巧 1.问题描述在使用TensorFlow&Keras通过GPU进行加速训练时,有时在训练一个任务的时候需要去测试结果 ...
tensorflow with gpu 环境配置
1.准备工作 1.1 确保GPU驱动已经安装 lspci | grep -i nvidia 通过此命令可以查看GPU信息,测试机已经安装GPU驱动

随机推荐

达拉草201771010105《面向对象程序设计（java）》第一周学习总结
达拉草201771010105<面向对象程序设计(java)>第一周学习总结第一部分:课程准备部分填写课程学习平台注册账号, 平台名称注册账号博客园:www.cnblogs.co ...
【视频+图文】带你快速掌握Java中含break语句的双重for循环
双重for循环掌握后,我们就一起来看看双重for循环的进阶内容一之带break语句的双重for循环. 双重for循环[视频+图文]讲解传输门:点击这里可去小乔的哔哩哔哩观看~ 带continue语句的 ...
Java 线程基础知识
前言什么是线程?线程,有时被称为轻量进程(Lightweight Process,LWP),是程序执行流的最小单元.一个标准的线程由线程 ID,当前指令指针 (PC),寄存器集合和堆栈组成.另外,线 ...
vue实现选中效果
前情提要好久没有写Vue了,略有生疏,这个东西还是得多用.下午看到一个需求,选择相册图片作为轮播图显示.接口返回相册列表,用户选一下再扔回去.直到我看到e.target.className我就知道这 ...
每个 JavaScript 工程师都应当知道的 10 个面试题
1. 能说出来两种对于 JavaScript 工程师很重要的编程范式么? JavaScript 是一门多范式(multi-paradigm)的编程语言,它既支持命令式(imperative)/面向过程 ...
RestTemplate 负载均衡原理
RestTemplate 是通过拦截器改变请求的URI的方式来指定服务器的,此处将通过一个自定义LoadBalanced的方式来进行说明 1.导入jar包 <parent> <gro ...
EF多租户实例：如何快速实现和同时支持多个DbContext
前言上一篇随笔我们谈到了多租户模式,通过多租户模式的演化的例子.大致归纳和总结了几种模式的表现形式. 并且顺带提到了读写分离. 通过好几次的代码调整,使得这个库更加通用.今天我们聊聊怎么通过该类库快 ...
Redis(6)——GeoHash查找附近的人
像微信 "附近的人",美团 "附近的餐厅",支付宝共享单车 "附近的车" 是怎么设计实现的呢? 一.使用数据库实现查找附近的人我们都知道, ...
WEB渗透 - HTTP协议基础
年初八星灯花 https只能提高传输层安全每一次客户端和服务端的通信都是独立的过程 cookie包括了sessionID和其他信息重要的header S - C Set-Cookie:服务器发给 ...
centos7下pymysql安装
1. 安装添加mysql yum respository 添加 MySQL Yum Repository 到你的系统 repository 列表中,执行 wget http://repo.mysql ...

TensorFlow 多 GPU 处理并行数据

Multi-GPU processing with data parallelism

TensorFlow 多 GPU 处理并行数据的更多相关文章

随机推荐

热门专题