keras 自适应分配显存 & 清理不用的变量释放 GPU 显存

Intro

Are you running out of GPU memory when using keras or tensorflow deep learning models, but only some of the time?

Are you curious about exactly how much GPU memory your tensorflow model uses during training?

Are you wondering if you can run two or more keras models on your GPU at the same time?

Background

By default, tensorflow pre-allocates nearly all of the available GPU memory, which is bad for a variety of use cases, especially production and memory profiling.

When keras uses tensorflow for its back-end, it inherits this behavior.

Setting `tensorflow` GPU memory options

For new models

Thankfully, tensorflow allows you to change how it allocates GPU memory, and to set a limit on how much GPU memory it is allowed to allocate.

Let’s set GPU options on keras‘s example Sequence classification with LSTM network

## keras example imports

from keras.models import Sequential

from keras.layers import Dense, Dropout

from keras.layers import Embedding

from keras.layers import LSTM

## extra imports to set GPU options

import tensorflow as tf

from keras import backend as k

###################################

# TensorFlow wizardry

config = tf.ConfigProto()

# Don't pre-allocate memory; allocate as-needed

config.gpu_options.allow_growth = True

# Only allow a total of half the GPU memory to be allocated

#config.gpu_options.per_process_gpu_memory_fraction = 0.5

# Create a session with the above options specified.

k.tensorflow_backend.set_session(tf.Session(config=config))

###################################

model = Sequential()

model.add(Embedding(max_features, output_dim=256))

model.add(LSTM(128))

model.add(Dropout(0.5))

model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',

              optimizer='rmsprop',

              metrics=['accuracy'])

model.fit(x_train, y_train, batch_size=16, epochs=10)

score = model.evaluate(x_test, y_test, batch_size=16)

After the above, when we create the sequence classification model, it won’t use half the GPU memory automatically, but rather will allocate GPU memory as-needed during the calls to model.fit() and model.evaluate().

Additionally, with the per_process_gpu_memory_fraction = 0.5, tensorflow will only allocate a total of half the available GPU memory.

If it tries to allocate more than half of the total GPU memory, tensorflow will throw a ResourceExhaustedError, and you’ll get a lengthy stack trace.

If you have a Linux machine and an nvidia card, you can watch nvidia-smi to see how much GPU memory is in use, or can configure a monitoring tool like monitorix to generate graphs for you.

GPU memory usage, as shown in Monitorix for Linux

For a model that you’re loading

We can even set GPU memory management options for a model that’s already created and trained, and that we’re loading from disk for deployment or for further training.

For that, let’s tweak keras‘s load_model example:

# keras example imports

from keras.models import load_model

## extra imports to set GPU options

import tensorflow as tf

from keras import backend as k

###################################

# TensorFlow wizardry

config = tf.ConfigProto()

# Don't pre-allocate memory; allocate as-needed

config.gpu_options.allow_growth = True

# Only allow a total of half the GPU memory to be allocated

config.gpu_options.per_process_gpu_memory_fraction = 0.5

# Create a session with the above options specified.

k.tensorflow_backend.set_session(tf.Session(config=config))

###################################

# returns a compiled model

# identical to the previous one

model = load_model('my_model.h5')

# TODO: classify all the things

Now, with your loaded model, you can open your favorite GPU monitoring tool and watch how the GPU memory usage changes under different loads.

Conclusion

Good news everyone! That sweet deep learning model you just made doesn’t actually need all that memory it usually claims!

And, now that you can tell tensorflow not to pre-allocate memory, you can get a much better idea of what kind of rig(s) you need in order to deploy your model into production.

Is this how you’re handling GPU memory management issues with tensorflow or keras?

Did I miss a better, cleaner way of handling GPU memory allocation with tensorflow and keras?

Let me know in the comments!

====================================================================================

How to remove stale models from GPU memory

import gc

m = Model(.....)

m.save(tmp_model_name)

del m

K.clear_session()

gc.collect()

m = load_model(tmp_model_name)

参考： https://michaelblogscode.wordpress.com/2017/10/10/reducing-and-profiling-gpu-memory-usage-in-keras-with-tensorflow-backend/

https://github.com/keras-team/keras/issues/5345

来自为知笔记(Wiz)

Reducing and Profiling GPU Memory Usage in Keras with TensorFlow Backend的更多相关文章

GPU Memory Usage占满而GPU-Util却为0的调试
最近使用github上的一个开源项目训练基于CNN的翻译模型,使用THEANO_FLAGS='floatX=float32,device=gpu2,lib.cnmem=1' python run_nn ...
Allowing GPU memory growth
By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICE ...
Redis: Reducing Memory Usage
High Level Tips for Redis Most of Stream-Framework's users start out with Redis and eventually move ...
Android 性能优化（21）＊性能工具之「GPU呈现模式分析」Profiling GPU Rendering Walkthrough：分析View显示是否超标
Profiling GPU Rendering Walkthrough 1.In this document Prerequisites Profile GPU Rendering $adb shel ...
Memory usage of a Java process java Xms Xmx Xmn
http://www.oracle.com/technetwork/java/javase/memleaks-137499.html 3.1 Meaning of OutOfMemoryError O ...
Shell script for logging cpu and memory usage of a Linux process
Shell script for logging cpu and memory usage of a Linux process http://www.unix.com/shell-programmi ...
5 commands to check memory usage on Linux
Memory Usage On linux, there are commands for almost everything, because the gui might not be always ...
SHELL:Find Memory Usage In Linux (统计每个程序内存使用情况)
转载一个shell统计linux系统中每个程序的内存使用情况,因为内存结构非常复杂,不一定100%精确,此shell可以在Ghub上下载. [root@db231 ~]# ./memstat.sh P ...
Why does the memory usage increase when I redeploy a web application?
That is because your web application has a memory leak. A common issue are "PermGen" memor ...

随机推荐

UIScrollView之isTracking delaysContentTouches canCancelContentTouches
UIScrollView有一个BOOL类型的tracking属性,用来返回用户是否已经触及内容并打算开始滚动,我们从这个属性开始探究UIScrollView的工作原理: 当手指触摸到UIScrollV ...
IKAnalyzer 独立使用配置扩展词典
有三点要注意(要不然扩展词典始终不生效): 后缀名.dic的词典文件,必须如使用文档里所说的无BOM的UTF-8编码保存的文件.如果不确定什么是无BOM的UTF-8编码,最简单的方式就是用No ...
(转)mysql5.7 根据二进制文件mysqlbinlog恢复数据库 Linux
原文:http://blog.csdn.net/qq_15058425/article/details/61196085 1.开始mysqlbinlog日志功能先找打my.cnf文件的位置: 2.编 ...
EF基础知识小记三(设计器=>数据库)
本文主要介绍通过EF的设计器来同步数据库和对应的实体类.并使用生成的实体上下文,来进行简单的增删查该操作 1.通过EF设计器创建一个简单模型 (1).右键目标项目添加新建项 (2).选择ADO.Net ...
Dubbo-Centos7管控台安装
1.下载Tomcat7: $ wget http://mirrors.hust.edu.cn/apache/tomcat/tomcat-7/v7.0.57/bin/apache-tomcat-7.0. ...
关于class的签名Signature
举例1: public class Test05<A, B extends java.util.List<String>, C extends InputStream&Ser ...
jquery获取input的checked属性
1.经常需要判断某个按钮是否被选中. 2.基于jquery. <!DOCTYPE html> <html lang="en"> <head> & ...
127.0.0.1和0.0.0.0和本机IP的区别
在一次网络课程的听课中,我突然察觉到自己有个疑惑就是在配置一些服务的时候我们会用到localhost(127.0.0.1)或者0.0.0.0 和当前主机IP这三个.那么具体该怎么使用这三个地址,这三个 ...
elasticSearch6源码分析(2)模块化管理
elasticsearch里面的组件基本都是用Guice的Injector进行注入与获取实例方式进行模块化管理. 在node的构造方法中 /** * Constructs a node * * @pa ...
haproxy配置文件详解--转
原始出处:http://itnihao.blog.51cto.com/1741976/915537 #/usr/local/sbin/haproxy -f /etc/haproxy/haproxy.c ...

Reducing and Profiling GPU Memory Usage in Keras with TensorFlow Backend