TensorFlow 多 GPU 处理并行数据
Multi-GPU processing with data parallelism
If you write your software in a language like C++ for a single cpu core, making it run on multiple GPUs in parallel would require rewriting the software from scratch. But this is not the case with TensorFlow. Because of its symbolic nature, tensorflow can hide all that complexity, making it effortless to scale your program across many CPUs and GPUs.
Let’s start with the simple example of adding two vectors on CPU:
import tensorflow as tf
with tf.device(tf.DeviceSpec(device_type='CPU', device_index=0)):
a = tf.random_uniform([1000, 100])
b = tf.random_uniform([1000, 100])
c = a + b
tf.Session().run(c)
The same thing can as simply be done on GPU:
with tf.device(tf.DeviceSpec(device_type='GPU', device_index=0)):
a = tf.random_uniform([1000, 100])
b = tf.random_uniform([1000, 100])
c = a + b
```
But what if we have two GPUs and want to utilize both? To do that, we can split the data and use a separate GPU for processing each half:
```python
split_a = tf.split(a, 2)
split_b = tf.split(b, 2)
split_c = []
for i in range(2):
with tf.device(tf.DeviceSpec(device_type='GPU', device_index=i)):
split_c.append(split_a[i] + split_b[i])
c = tf.concat(split_c, axis=0)
```
Let's rewrite this in a more general form so that we can replace addition with any other set of operations:
<div class="se-preview-section-delimiter"></div>
```python
def make_parallel(fn, num_gpus, **kwargs):
in_splits = {}
for k, v in kwargs.items():
in_splits[k] = tf.split(v, num_gpus)
out_split = []
for i in range(num_gpus):
with tf.device(tf.DeviceSpec(device_type='GPU', device_index=i)):
with tf.variable_scope(tf.get_variable_scope(), reuse=i > 0):
out_split.append(fn(**{k : v[i] for k, v in in_splits.items()}))
return tf.concat(out_split, axis=0)
def model(a, b):
return a + b
c = make_parallel(model, 2, a=a, b=b)
You can replace the model with any function that takes a set of tensors as input and returns a tensor as result with the condition that both the input and output are in batch. Note that we also added a variable scope and set the reuse to true. This makes sure that we use the same variables for processing both splits. This is something that will become handy in our next example.
Let’s look at a slightly more practical example. We want to train a neural network on multiple GPUs. During training we not only need to compute the forward pass but also need to compute the backward pass (the gradients). But how can we parallelize the gradient computation? This turns out to be pretty easy.
Recall from the first item that we wanted to fit a second degree polynomial to a set of samples. We reorganized the code a bit to have the bulk of the operations in the model function:
import numpy as np
import tensorflow as tf
def model(x, y):
w = tf.get_variable("w", shape=[3, 1])
f = tf.stack([tf.square(x), x, tf.ones_like(x)], 1)
yhat = tf.squeeze(tf.matmul(f, w), 1)
loss = tf.square(yhat - y)
return loss
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
loss = model(x, y)
train_op = tf.train.AdamOptimizer(0.1).minimize(
tf.reduce_mean(loss))
def generate_data():
x_val = np.random.uniform(-10.0, 10.0, size=100)
y_val = 5 * np.square(x_val) + 3
return x_val, y_val
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for _ in range(1000):
x_val, y_val = generate_data()
_, loss_val = sess.run([train_op, loss], {x: x_val, y: y_val})
_, loss_val = sess.run([train_op, loss], {x: x_val, y: y_val})
print(sess.run(tf.contrib.framework.get_variables_by_name("w")))
Now let’s use make_parallel that we just wrote to parallelize this. We only need to change two lines of code from the above code:
loss = make_parallel(model, 2, x=x, y=y)
train_op = tf.train.AdamOptimizer(0.1).minimize(
tf.reduce_mean(loss),
colocate_gradients_with_ops=True)
The only thing that we need to change to parallelize backpropagation of gradients is to set the colocate_gradients_with_ops flag to true. This ensures that gradient ops run on the same device as the original op.
更多教程:http://www.tensorflownews.com/
TensorFlow 多 GPU 处理并行数据的更多相关文章
- Setup Tensorflow with GPU on Mac OSX 10.11
Setup Tensorflow with GPU on OSX 10.11 环境描述 电脑:MacBook Pro 15.6 CPU: 2.7GHz 显卡: GT 650m 系统:OSX 10.11 ...
- linux 安装tensorflow(gpu版本)
一.安装cuda 具体安装过程见我的另一篇博客,ubuntu16.04下安装配置深度学习环境 二.安装tensorflow 1.具体安装过程官网其实写的比较详细,总结一下的话可以分为两种:安装rele ...
- Tensorflow检验GPU是否安装成功 及 使用GPU训练注意事项
1. 已经安装cuda但是tensorflow仍然使用cpu加速的问题 电脑上同时安装了GPU和CPU版本的TensorFlow,本来想用下面代码测试一下GPU程序,但无奈老是没有调用GPU. imp ...
- Ubuntu16.04下安装tensorflow(GPU加速)【转】
本文转载自:https://blog.csdn.net/qq_30520759/article/details/78947034 版权声明:本文为博主原创文章,未经博主允许不得转载. https:// ...
- tensorflow 安装GPU版本,个人总结,步骤比较详细【转】
本文转载自:https://blog.csdn.net/gangeqian2/article/details/79358543 手把手教你windows安装tensorflow的教程参考另一篇博文ht ...
- Google TensorFlow for GPU安装、配置大坑
Google TensorFlow for GPU安装.配置大坑 从本周一开始(12.05),共4天半的时间,终于折腾好Google TensorFlow for GPU版本,其间跳坑无数,摔得遍体鳞 ...
- Win10 TensorFlow(gpu)安装详解
Win10 TensorFlow(gpu)安装详解 写在前面:TensorFlow是谷歌基于DistBelief进行研发的第二代人工智能学习系统,其命名来源于本身的运行原理.Tensor(张量)意味着 ...
- [开发技巧]·TensorFlow&Keras GPU使用技巧
[开发技巧]·TensorFlow&Keras GPU使用技巧 1.问题描述 在使用TensorFlow&Keras通过GPU进行加速训练时,有时在训练一个任务的时候需要去测试结果 ...
- tensorflow with gpu 环境配置
1.准备工作 1.1 确保GPU驱动已经安装 lspci | grep -i nvidia 通过此命令可以查看GPU信息,测试机已经安装GPU驱动
随机推荐
- 代码演示C#各版本新功能
代码演示C#各版本新功能 C#各版本新功能其实都能在官网搜到,但很少有人整理在一起,并通过非常简短的代码将每个新特性演示出来. 代码演示C#各版本新功能 C# 2.0版 - 2005 泛型 分部类型 ...
- C++扬帆远航——2
/* * Copyright (c) 2016,烟台大学计算机与控制工程学院 * All rights reserved. * 文件名:test.cpp * 作者:常轩 * 完成日期:2016年3月6 ...
- C++走向远洋——30(六周,项目一1.0)
*/ * Copyright (c) 2016,烟台大学计算机与控制工程学院 * All rights reserved. * 文件名:fenshu.cpp * 作者:常轩 * 微信公众号:World ...
- 从0开发3D引擎(十):使用领域驱动设计,从最小3D程序中提炼引擎(上)
目录 上一篇博文 下一篇博文 前置知识 回顾上文 最小3D程序完整代码地址 通用语言 将会在本文解决的不足之处 本文流程 解释本文使用的领域驱动设计的一些概念 本文的领域驱动设计选型 设计 引擎名 识 ...
- vue+express+mysql项目总结(node项目部署阿里云通用)
原文发布于我的个人博客上:原文点这里 前面经历千辛万苦,终于把博客的所有东西都准备好了,现在就只等部署了.下面我介绍下我的部署过程: 一.购买服务器和域名 如果需要域名(不用域名通过ip也可以 ...
- LeetCode 81.Search in Rotated Sorted Array II(M)
题目: Suppose an array sorted in ascending order is rotated at some pivot unknown to you beforehand. ( ...
- day06可变与不可变类型,if判断,运算符
1:可变不可变类型 2.什么是条件?什么可以当做条件?为何要要用条件? 显式布尔值:True.False 隐式布尔值:所有数据类型,其中0.None.空为假 3:逻辑运算符:用来 # not. and ...
- JAVA有关位运算的全套梳理
一.在计算机中数据是如何进行计算的? 1.1:java中的byte型数据取值范围 我们最开始学习java的时候知道,byte类型的数据占了8个bit位,每个位上或0或1,左边第一位表示符号位,符号位如 ...
- React官方脚手架不支持less问题解决
create-react-app是由React官方提供,并推荐构建React单页应用程序的最佳方法,但是默认不支持less,需要手动集成: 1,必须手动安装less npm install less ...
- MATLAB神经网络(2) BP神经网络的非线性系统建模——非线性函数拟合
2.1 案例背景 在工程应用中经常会遇到一些复杂的非线性系统,这些系统状态方程复杂,难以用数学方法准确建模.在这种情况下,可以建立BP神经网络表达这些非线性系统.该方法把未知系统看成是一个黑箱,首先用 ...