Reducing and Profiling GPU Memory Usage in Keras with TensorFlow Backend
keras 自适应分配显存 & 清理不用的变量释放 GPU 显存
Intro
Are you running out of GPU memory when using keras or tensorflow deep learning models, but only some of the time?
Are you curious about exactly how much GPU memory your tensorflow model uses during training?
Are you wondering if you can run two or more keras models on your GPU at the same time?
Background
By default, tensorflow pre-allocates nearly all of the available GPU memory, which is bad for a variety of use cases, especially production and memory profiling.
When keras uses tensorflow for its back-end, it inherits this behavior.
Setting tensorflow GPU memory options
For new models
Thankfully, tensorflow allows you to change how it allocates GPU memory, and to set a limit on how much GPU memory it is allowed to allocate.
Let’s set GPU options on keras‘s example Sequence classification with LSTM network
## keras example imports
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import Embedding
from keras.layers import LSTM ## extra imports to set GPU options
import tensorflow as tf
from keras import backend as k ###################################
# TensorFlow wizardry
config = tf.ConfigProto() # Don't pre-allocate memory; allocate as-needed
config.gpu_options.allow_growth = True # Only allow a total of half the GPU memory to be allocated
#config.gpu_options.per_process_gpu_memory_fraction = 0.5 # Create a session with the above options specified.
k.tensorflow_backend.set_session(tf.Session(config=config))
################################### model = Sequential()
model.add(Embedding(max_features, output_dim=256))
model.add(LSTM(128))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy']) model.fit(x_train, y_train, batch_size=16, epochs=10)
score = model.evaluate(x_test, y_test, batch_size=16)
After the above, when we create the sequence classification model, it won’t use half the GPU memory automatically, but rather will allocate GPU memory as-needed during the calls to model.fit() and model.evaluate().
Additionally, with the per_process_gpu_memory_fraction = 0.5, tensorflow will only allocate a total of half the available GPU memory.
If it tries to allocate more than half of the total GPU memory, tensorflow will throw a ResourceExhaustedError, and you’ll get a lengthy stack trace.
If you have a Linux machine and an nvidia card, you can watch nvidia-smi to see how much GPU memory is in use, or can configure a monitoring tool like monitorix to generate graphs for you.

GPU memory usage, as shown in Monitorix for Linux
For a model that you’re loading
We can even set GPU memory management options for a model that’s already created and trained, and that we’re loading from disk for deployment or for further training.
For that, let’s tweak keras‘s load_model example:
# keras example imports
from keras.models import load_model ## extra imports to set GPU options
import tensorflow as tf
from keras import backend as k ###################################
# TensorFlow wizardry
config = tf.ConfigProto() # Don't pre-allocate memory; allocate as-needed
config.gpu_options.allow_growth = True # Only allow a total of half the GPU memory to be allocated
config.gpu_options.per_process_gpu_memory_fraction = 0.5 # Create a session with the above options specified.
k.tensorflow_backend.set_session(tf.Session(config=config))
################################### # returns a compiled model
# identical to the previous one
model = load_model('my_model.h5') # TODO: classify all the things
Now, with your loaded model, you can open your favorite GPU monitoring tool and watch how the GPU memory usage changes under different loads.
Conclusion
Good news everyone! That sweet deep learning model you just made doesn’t actually need all that memory it usually claims!
And, now that you can tell tensorflow not to pre-allocate memory, you can get a much better idea of what kind of rig(s) you need in order to deploy your model into production.
Is this how you’re handling GPU memory management issues with tensorflow or keras?
Did I miss a better, cleaner way of handling GPU memory allocation with tensorflow and keras?
Let me know in the comments!
How to remove stale models from GPU memory
import gc
m = Model(.....)
m.save(tmp_model_name)
del m
K.clear_session()
gc.collect()
m = load_model(tmp_model_name)
Reducing and Profiling GPU Memory Usage in Keras with TensorFlow Backend的更多相关文章
- GPU Memory Usage占满而GPU-Util却为0的调试
最近使用github上的一个开源项目训练基于CNN的翻译模型,使用THEANO_FLAGS='floatX=float32,device=gpu2,lib.cnmem=1' python run_nn ...
- Allowing GPU memory growth
By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICE ...
- Redis: Reducing Memory Usage
High Level Tips for Redis Most of Stream-Framework's users start out with Redis and eventually move ...
- Android 性能优化(21)*性能工具之「GPU呈现模式分析」Profiling GPU Rendering Walkthrough:分析View显示是否超标
Profiling GPU Rendering Walkthrough 1.In this document Prerequisites Profile GPU Rendering $adb shel ...
- Memory usage of a Java process java Xms Xmx Xmn
http://www.oracle.com/technetwork/java/javase/memleaks-137499.html 3.1 Meaning of OutOfMemoryError O ...
- Shell script for logging cpu and memory usage of a Linux process
Shell script for logging cpu and memory usage of a Linux process http://www.unix.com/shell-programmi ...
- 5 commands to check memory usage on Linux
Memory Usage On linux, there are commands for almost everything, because the gui might not be always ...
- SHELL:Find Memory Usage In Linux (统计每个程序内存使用情况)
转载一个shell统计linux系统中每个程序的内存使用情况,因为内存结构非常复杂,不一定100%精确,此shell可以在Ghub上下载. [root@db231 ~]# ./memstat.sh P ...
- Why does the memory usage increase when I redeploy a web application?
That is because your web application has a memory leak. A common issue are "PermGen" memor ...
随机推荐
- HUE配置文件hue.ini 的hdfs_clusters模块详解(图文详解)(分HA集群和非HA集群)
不多说,直接上干货! 我的集群机器情况是 bigdatamaster(192.168.80.10).bigdataslave1(192.168.80.11)和bigdataslave2(192.168 ...
- dynamic解析Http xml格式响应数据
继续上一篇 构建RESTful风格的WCF服务 ,咱已经把服务端的数据和服务准备好了,客户端调用 wcf rest接口后如何解析xml?下面使用dynamic关键字解析来至于WCF REST XML响 ...
- Ethereum 源码分析之框架
accounts 实现了一个高等级的以太坊账户管理 bmt 二进制的默克尔树的实现 build 主要是编译和构建的一些脚本和配置 cmd ...
- 业务ID 生成策略
业务ID 生成策略,从技术上说,基本要借助一个集中式的引擎来帮忙实现. 为了扩大业务ID生成策略的并发问题,还有更为技巧性的提升. 先来介绍普遍的分布式ID生成策略: 1. 利用DB的自增主键 这里又 ...
- nginx学习笔记(8)虚拟主机名---转载
通配符名字正则表达式名字其他类型的名字优化兼容性 虚拟主机名使用server_name指令定义,用于决定由某台虚拟主机来处理请求.具体请参考<nginx如何处理一个请求>.虚拟主机名可以使 ...
- MVC、MVCS、MVVM、MVP、VIPER等这么多架构模式哪一个好呢?
在项目开启阶段,其中一个很重要的环节就是选架构. 那么面对目前已知的这么多架构模式我们该怎么选择呢?这确实是个很让人头疼的问题! 下面我就在这里梳理一下目前常见的一些架构模式. 先逐个对它们的分析 ...
- RabbitMQ上手记录–part 5-节点集群高可用(多服务器)
上一part<RabbitMQ上手记录–part 4-节点集群(单机多节点)>中介绍了RabbitMQ集群的一些概念以及实现了在单机上运行多个节点,并且将多个节点组成一个集群. 通常情况下 ...
- JavaScript深浅拷贝
深浅拷贝 基本类型和引用类型 ECMAScript 中的变量类型分为两类: 基本类型:undefined,null,布尔值(Boolean),字符串(String),数值(Number) 引用类型: ...
- HA_Snapshots 数据库快照
1. 创建测试数据库HA_Snapshot 2. 创建快照 create database HA_Snapshot_20 on ( name = HA_Snapshot, filename = '.. ...
- LinqProvider系列(三)如何实现自己的Linq Provider?
这篇文章将在前人的肩上,继续完成实现Linq Provider的任务. 首先,我们列出linq语法的解析过程: linq本质上就是把我们惯用的语法糖,变成了一颗表达式树,然后由不同的linq Prov ...