TensorFlow 多 GPU 处理并行数据
Multi-GPU processing with data parallelism
If you write your software in a language like C++ for a single cpu core, making it run on multiple GPUs in parallel would require rewriting the software from scratch. But this is not the case with TensorFlow. Because of its symbolic nature, tensorflow can hide all that complexity, making it effortless to scale your program across many CPUs and GPUs.
Let’s start with the simple example of adding two vectors on CPU:
import tensorflow as tf
with tf.device(tf.DeviceSpec(device_type='CPU', device_index=0)):
a = tf.random_uniform([1000, 100])
b = tf.random_uniform([1000, 100])
c = a + b
tf.Session().run(c)
The same thing can as simply be done on GPU:
with tf.device(tf.DeviceSpec(device_type='GPU', device_index=0)):
a = tf.random_uniform([1000, 100])
b = tf.random_uniform([1000, 100])
c = a + b
```
But what if we have two GPUs and want to utilize both? To do that, we can split the data and use a separate GPU for processing each half:
```python
split_a = tf.split(a, 2)
split_b = tf.split(b, 2)
split_c = []
for i in range(2):
with tf.device(tf.DeviceSpec(device_type='GPU', device_index=i)):
split_c.append(split_a[i] + split_b[i])
c = tf.concat(split_c, axis=0)
```
Let's rewrite this in a more general form so that we can replace addition with any other set of operations:
<div class="se-preview-section-delimiter"></div>
```python
def make_parallel(fn, num_gpus, **kwargs):
in_splits = {}
for k, v in kwargs.items():
in_splits[k] = tf.split(v, num_gpus)
out_split = []
for i in range(num_gpus):
with tf.device(tf.DeviceSpec(device_type='GPU', device_index=i)):
with tf.variable_scope(tf.get_variable_scope(), reuse=i > 0):
out_split.append(fn(**{k : v[i] for k, v in in_splits.items()}))
return tf.concat(out_split, axis=0)
def model(a, b):
return a + b
c = make_parallel(model, 2, a=a, b=b)
You can replace the model with any function that takes a set of tensors as input and returns a tensor as result with the condition that both the input and output are in batch. Note that we also added a variable scope and set the reuse to true. This makes sure that we use the same variables for processing both splits. This is something that will become handy in our next example.
Let’s look at a slightly more practical example. We want to train a neural network on multiple GPUs. During training we not only need to compute the forward pass but also need to compute the backward pass (the gradients). But how can we parallelize the gradient computation? This turns out to be pretty easy.
Recall from the first item that we wanted to fit a second degree polynomial to a set of samples. We reorganized the code a bit to have the bulk of the operations in the model function:
import numpy as np
import tensorflow as tf
def model(x, y):
w = tf.get_variable("w", shape=[3, 1])
f = tf.stack([tf.square(x), x, tf.ones_like(x)], 1)
yhat = tf.squeeze(tf.matmul(f, w), 1)
loss = tf.square(yhat - y)
return loss
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
loss = model(x, y)
train_op = tf.train.AdamOptimizer(0.1).minimize(
tf.reduce_mean(loss))
def generate_data():
x_val = np.random.uniform(-10.0, 10.0, size=100)
y_val = 5 * np.square(x_val) + 3
return x_val, y_val
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for _ in range(1000):
x_val, y_val = generate_data()
_, loss_val = sess.run([train_op, loss], {x: x_val, y: y_val})
_, loss_val = sess.run([train_op, loss], {x: x_val, y: y_val})
print(sess.run(tf.contrib.framework.get_variables_by_name("w")))
Now let’s use make_parallel that we just wrote to parallelize this. We only need to change two lines of code from the above code:
loss = make_parallel(model, 2, x=x, y=y)
train_op = tf.train.AdamOptimizer(0.1).minimize(
tf.reduce_mean(loss),
colocate_gradients_with_ops=True)
The only thing that we need to change to parallelize backpropagation of gradients is to set the colocate_gradients_with_ops flag to true. This ensures that gradient ops run on the same device as the original op.
更多教程:http://www.tensorflownews.com/
TensorFlow 多 GPU 处理并行数据的更多相关文章
- Setup Tensorflow with GPU on Mac OSX 10.11
Setup Tensorflow with GPU on OSX 10.11 环境描述 电脑:MacBook Pro 15.6 CPU: 2.7GHz 显卡: GT 650m 系统:OSX 10.11 ...
- linux 安装tensorflow(gpu版本)
一.安装cuda 具体安装过程见我的另一篇博客,ubuntu16.04下安装配置深度学习环境 二.安装tensorflow 1.具体安装过程官网其实写的比较详细,总结一下的话可以分为两种:安装rele ...
- Tensorflow检验GPU是否安装成功 及 使用GPU训练注意事项
1. 已经安装cuda但是tensorflow仍然使用cpu加速的问题 电脑上同时安装了GPU和CPU版本的TensorFlow,本来想用下面代码测试一下GPU程序,但无奈老是没有调用GPU. imp ...
- Ubuntu16.04下安装tensorflow(GPU加速)【转】
本文转载自:https://blog.csdn.net/qq_30520759/article/details/78947034 版权声明:本文为博主原创文章,未经博主允许不得转载. https:// ...
- tensorflow 安装GPU版本,个人总结,步骤比较详细【转】
本文转载自:https://blog.csdn.net/gangeqian2/article/details/79358543 手把手教你windows安装tensorflow的教程参考另一篇博文ht ...
- Google TensorFlow for GPU安装、配置大坑
Google TensorFlow for GPU安装.配置大坑 从本周一开始(12.05),共4天半的时间,终于折腾好Google TensorFlow for GPU版本,其间跳坑无数,摔得遍体鳞 ...
- Win10 TensorFlow(gpu)安装详解
Win10 TensorFlow(gpu)安装详解 写在前面:TensorFlow是谷歌基于DistBelief进行研发的第二代人工智能学习系统,其命名来源于本身的运行原理.Tensor(张量)意味着 ...
- [开发技巧]·TensorFlow&Keras GPU使用技巧
[开发技巧]·TensorFlow&Keras GPU使用技巧 1.问题描述 在使用TensorFlow&Keras通过GPU进行加速训练时,有时在训练一个任务的时候需要去测试结果 ...
- tensorflow with gpu 环境配置
1.准备工作 1.1 确保GPU驱动已经安装 lspci | grep -i nvidia 通过此命令可以查看GPU信息,测试机已经安装GPU驱动
随机推荐
- Centos 7 使用Securecrt 配置Public key 登录
环境:Centos 7 SecureCRT 版本:8.0.4 需求:配置使用Public key 登录服务器禁用密码登录 1. 配置使用SecureCRT,生成Public key 跟私钥 2. 配置 ...
- 利用短信通知的方式在Tasker中实现收到Android手机短信自动转发到邮箱
利用短信的通知实现短信内容转发到微信 code[class*="language-"] { padding: .1em; border-radius: .3em; white-sp ...
- SpringCloud - 全家桶
先导篇:SpringCloud介绍篇 第一篇:注册中心Eureka 第二篇:服务提供与Rest+Ribbon调用 第三篇:服务提供与Feign调用 第四篇:熔断器Hystrix(断路器) 第五篇:熔断 ...
- CSS3实现一个旋转的花朵
要效果图如下: 实现原理:其实很简单,就是中间的圆圈定位在中间,其他的6个圆圈,进行不同的绝对定位,然后进行旋转!代码: <!DOCTYPE html> <html lang=&qu ...
- 遍历tree
1.解决方法 filterData (arr) { var newArr = [] arr.map((item) => { var childrenArr if (item.children ! ...
- H5系列一、静态页面总结
1.img父标签设置高度,如果容器没有设置高度的话,图片会默认把容器底部撑大几像素 -- 大概3px,给容器设置高度. 2.input标记换行后默认有一个间隙,设置float属性.input标记默认还 ...
- win下安装virtualenv和创建django项目
一.由于一直在Linux环境下开发,想了解一下winPython开发环境: 1.打开cmd,pip install virtualenv 2.virtualenv test 由于这样需要进入到目录下才 ...
- Simulink仿真入门到精通(十七) Simulink代码生成技术详解
17.1 基于模型的设计 基于模型设计是一种流程,较之传统软件开发流程而言,使开发者能够更快捷.更高效地进行开发.适用范围包括汽车电子信号处理.控制系统.通信行业和半导体行业. V字模型开发流程整体描 ...
- Simulink仿真入门到精通(十一) 模块的封装
当用户编写了自定义的S函数或者使用Simulink标准库中的模块搭建子系统后,可以通过封装为其设计显示外观,追加参数对话框. 封装是构建一个以对话框为接口的交互界面的过程,它将复杂的模块逻辑关系隐藏起 ...
- 03 Uipath调用VBA脚本,处理excel文档格式
前言: 在平时我们的工作中,经常需要使用Uipath自动的导入大量数据到Excel表格中,但是却发现,数据导入到Excel之后,格式却是很乱,基本不能看,就像下图: 而Uipath对Excel的操 ...