Reducing and Profiling GPU Memory Usage in Keras with TensorFlow Backend
keras 自适应分配显存 & 清理不用的变量释放 GPU 显存
Intro
Are you running out of GPU memory when using keras
or tensorflow
deep learning models, but only some of the time?
Are you curious about exactly how much GPU memory your tensorflow
model uses during training?
Are you wondering if you can run two or more keras
models on your GPU at the same time?
Background
By default, tensorflow
pre-allocates nearly all of the available GPU memory, which is bad for a variety of use cases, especially production and memory profiling.
When keras
uses tensorflow
for its back-end, it inherits this behavior.
Setting tensorflow
GPU memory options
For new models
Thankfully, tensorflow
allows you to change how it allocates GPU memory, and to set a limit on how much GPU memory it is allowed to allocate.
Let’s set GPU options on keras
‘s example Sequence classification with LSTM network
## keras example imports
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import Embedding
from keras.layers import LSTM ## extra imports to set GPU options
import tensorflow as tf
from keras import backend as k ###################################
# TensorFlow wizardry
config = tf.ConfigProto() # Don't pre-allocate memory; allocate as-needed
config.gpu_options.allow_growth = True # Only allow a total of half the GPU memory to be allocated
#config.gpu_options.per_process_gpu_memory_fraction = 0.5 # Create a session with the above options specified.
k.tensorflow_backend.set_session(tf.Session(config=config))
################################### model = Sequential()
model.add(Embedding(max_features, output_dim=256))
model.add(LSTM(128))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy']) model.fit(x_train, y_train, batch_size=16, epochs=10)
score = model.evaluate(x_test, y_test, batch_size=16)
After the above, when we create the sequence classification model, it won’t use half the GPU memory automatically, but rather will allocate GPU memory as-needed during the calls to model.fit()
and model.evaluate()
.
Additionally, with the per_process_gpu_memory_fraction = 0.5
, tensorflow
will only allocate a total of half the available GPU memory.
If it tries to allocate more than half of the total GPU memory, tensorflow
will throw a ResourceExhaustedError
, and you’ll get a lengthy stack trace.
If you have a Linux machine and an nvidia card, you can watch nvidia-smi
to see how much GPU memory is in use, or can configure a monitoring tool like monitorix
to generate graphs for you.

GPU memory usage, as shown in Monitorix for Linux
For a model that you’re loading
We can even set GPU memory management options for a model that’s already created and trained, and that we’re loading from disk for deployment or for further training.
For that, let’s tweak keras
‘s load_model
example:
# keras example imports
from keras.models import load_model ## extra imports to set GPU options
import tensorflow as tf
from keras import backend as k ###################################
# TensorFlow wizardry
config = tf.ConfigProto() # Don't pre-allocate memory; allocate as-needed
config.gpu_options.allow_growth = True # Only allow a total of half the GPU memory to be allocated
config.gpu_options.per_process_gpu_memory_fraction = 0.5 # Create a session with the above options specified.
k.tensorflow_backend.set_session(tf.Session(config=config))
################################### # returns a compiled model
# identical to the previous one
model = load_model('my_model.h5') # TODO: classify all the things
Now, with your loaded model, you can open your favorite GPU monitoring tool and watch how the GPU memory usage changes under different loads.
Conclusion
Good news everyone! That sweet deep learning model you just made doesn’t actually need all that memory it usually claims!
And, now that you can tell tensorflow
not to pre-allocate memory, you can get a much better idea of what kind of rig(s) you need in order to deploy your model into production.
Is this how you’re handling GPU memory management issues with tensorflow
or keras
?
Did I miss a better, cleaner way of handling GPU memory allocation with tensorflow
and keras
?
Let me know in the comments!
How to remove stale models from GPU memory
import gc
m = Model(.....)
m.save(tmp_model_name)
del m
K.clear_session()
gc.collect()
m = load_model(tmp_model_name)
Reducing and Profiling GPU Memory Usage in Keras with TensorFlow Backend的更多相关文章
- GPU Memory Usage占满而GPU-Util却为0的调试
最近使用github上的一个开源项目训练基于CNN的翻译模型,使用THEANO_FLAGS='floatX=float32,device=gpu2,lib.cnmem=1' python run_nn ...
- Allowing GPU memory growth
By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICE ...
- Redis: Reducing Memory Usage
High Level Tips for Redis Most of Stream-Framework's users start out with Redis and eventually move ...
- Android 性能优化(21)*性能工具之「GPU呈现模式分析」Profiling GPU Rendering Walkthrough:分析View显示是否超标
Profiling GPU Rendering Walkthrough 1.In this document Prerequisites Profile GPU Rendering $adb shel ...
- Memory usage of a Java process java Xms Xmx Xmn
http://www.oracle.com/technetwork/java/javase/memleaks-137499.html 3.1 Meaning of OutOfMemoryError O ...
- Shell script for logging cpu and memory usage of a Linux process
Shell script for logging cpu and memory usage of a Linux process http://www.unix.com/shell-programmi ...
- 5 commands to check memory usage on Linux
Memory Usage On linux, there are commands for almost everything, because the gui might not be always ...
- SHELL:Find Memory Usage In Linux (统计每个程序内存使用情况)
转载一个shell统计linux系统中每个程序的内存使用情况,因为内存结构非常复杂,不一定100%精确,此shell可以在Ghub上下载. [root@db231 ~]# ./memstat.sh P ...
- Why does the memory usage increase when I redeploy a web application?
That is because your web application has a memory leak. A common issue are "PermGen" memor ...
随机推荐
- (转)python高级:列表解析和生成表达式
一.语法糖的概念 “糖”,可以理解为简单.简洁,“语法糖”使我们可以更加简洁.快速的实现这些功能. 只是Python解释器会把这些特定格式的语法翻译成原本那样复杂的代码逻辑 我们使用的语法糖有: if ...
- MySQl资料链接
原文:http://www.uml.org.cn/sjjm/sjjm-MySql.asp MySQl MySQL高可用数据库内核深度优化的四重定制 MySQL数据表存储引擎类型及特性 My ...
- Linux grep命令详解[备份]
linux grep命令 1.作用Linux系统中grep命令是一种强大的文本搜索工具,它能使用正则表达式搜索文本,并把匹 配的行打印出来.grep全称是Global Regular Expressi ...
- scala combineByKey用法说明
语法是: combineByKey[C]( createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: ( ...
- python字符串中包含Unicode插入数据库乱码问题 分类: Python 2015-04-28 18:19 342人阅读 评论(0) 收藏
之前在编码的时候遇到一个奇葩的问题,无论如何操作,写入数据库的字符都是乱码,之后是这样解决的,意思就是先解码,然后再插入数据库 cost_str = json.dumps(cost_info) cos ...
- LogCat里的错误提示 FATAL EXCEPTION: main
程序一运行闪退. 原因为包冲突,将冲突的包删除即可.
- thread 带参数
在 .NET Framework 2.0 版中,要实现线程调用带参数的方法有两种办法. 第一种:使用ParameterizedThreadStart. 调用 System.Threading.Thre ...
- #define a int[10]与 typedef int a[10]用法
// #define a int[10] #include <stdio.h> #include <stdlib.h> #define a int[10] int main() ...
- NMS—卷积神经网络
1-传统的NMS NMS,非极大值抑制,在很多计算机视觉问题中有着重要应用,尤其是目标检测领域. 以人脸检测为例,通常的流程为3步: (1)通过滑动窗口或者其它的object proposals方法产 ...
- vue 监听对象里的特定数据
vue 监听对象里的特定数据变化 通常是这样写的,只能监听某一个特定数据 watch: { params: function(val) { console.log(val) this.$ajax.g ...