keras 自适应分配显存 & 清理不用的变量释放 GPU 显存

Intro

Are you running out of GPU memory when using keras or tensorflow deep learning models, but only some of the time?

Are you curious about exactly how much GPU memory your tensorflow model uses during training?

Are you wondering if you can run two or more keras models on your GPU at the same time?

Background

By default, tensorflow pre-allocates nearly all of the available GPU memory, which is bad for a variety of use cases, especially production and memory profiling.

When keras uses tensorflow for its back-end, it inherits this behavior.

Setting tensorflow GPU memory options

For new models

Thankfully, tensorflow allows you to change how it allocates GPU memory, and to set a limit on how much GPU memory it is allowed to allocate.

Let’s set GPU options on keras‘s example Sequence classification with LSTM network

 
## keras example imports
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import Embedding
from keras.layers import LSTM ## extra imports to set GPU options
import tensorflow as tf
from keras import backend as k ###################################
# TensorFlow wizardry
config = tf.ConfigProto() # Don't pre-allocate memory; allocate as-needed
config.gpu_options.allow_growth = True # Only allow a total of half the GPU memory to be allocated
#config.gpu_options.per_process_gpu_memory_fraction = 0.5 # Create a session with the above options specified.
k.tensorflow_backend.set_session(tf.Session(config=config))
################################### model = Sequential()
model.add(Embedding(max_features, output_dim=256))
model.add(LSTM(128))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy']) model.fit(x_train, y_train, batch_size=16, epochs=10)
score = model.evaluate(x_test, y_test, batch_size=16)

After the above, when we create the sequence classification model, it won’t use half the GPU memory automatically, but rather will allocate GPU memory as-needed during the calls to model.fit() and model.evaluate().

Additionally, with the per_process_gpu_memory_fraction = 0.5tensorflow will only allocate a total of half the available GPU memory.

If it tries to allocate more than half of the total GPU memory, tensorflow will throw a ResourceExhaustedError, and you’ll get a lengthy stack trace.

If you have a Linux machine and an nvidia card, you can watch nvidia-smi to see how much GPU memory is in use, or can configure a monitoring tool like monitorix to generate graphs for you.

GPU memory usage, as shown in Monitorix for Linux

For a model that you’re loading

We can even set GPU memory management options for a model that’s already created and trained, and that we’re loading from disk for deployment or for further training.

For that, let’s tweak keras‘s load_model example:

 
# keras example imports
from keras.models import load_model ## extra imports to set GPU options
import tensorflow as tf
from keras import backend as k ###################################
# TensorFlow wizardry
config = tf.ConfigProto() # Don't pre-allocate memory; allocate as-needed
config.gpu_options.allow_growth = True # Only allow a total of half the GPU memory to be allocated
config.gpu_options.per_process_gpu_memory_fraction = 0.5 # Create a session with the above options specified.
k.tensorflow_backend.set_session(tf.Session(config=config))
################################### # returns a compiled model
# identical to the previous one
model = load_model('my_model.h5') # TODO: classify all the things

Now, with your loaded model, you can open your favorite GPU monitoring tool and watch how the GPU memory usage changes under different loads.

Conclusion

Good news everyone! That sweet deep learning model you just made doesn’t actually need all that memory it usually claims!

And, now that you can tell tensorflow not to pre-allocate memory, you can get a much better idea of what kind of rig(s) you need in order to deploy your model into production.

Is this how you’re handling GPU memory management issues with tensorflow or keras?

Did I miss a better, cleaner way of handling GPU memory allocation with tensorflow and keras?

Let me know in the comments!

 
 
====================================================================================
 

How to remove stale models from GPU memory

import gc
m = Model(.....)
m.save(tmp_model_name)
del m
K.clear_session()
gc.collect()
m = load_model(tmp_model_name)

参考: https://michaelblogscode.wordpress.com/2017/10/10/reducing-and-profiling-gpu-memory-usage-in-keras-with-tensorflow-backend/

https://github.com/keras-team/keras/issues/5345

Reducing and Profiling GPU Memory Usage in Keras with TensorFlow Backend的更多相关文章

  1. GPU Memory Usage占满而GPU-Util却为0的调试

    最近使用github上的一个开源项目训练基于CNN的翻译模型,使用THEANO_FLAGS='floatX=float32,device=gpu2,lib.cnmem=1' python run_nn ...

  2. Allowing GPU memory growth

    By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICE ...

  3. Redis: Reducing Memory Usage

    High Level Tips for Redis Most of Stream-Framework's users start out with Redis and eventually move ...

  4. Android 性能优化(21)*性能工具之「GPU呈现模式分析」Profiling GPU Rendering Walkthrough:分析View显示是否超标

    Profiling GPU Rendering Walkthrough 1.In this document Prerequisites Profile GPU Rendering $adb shel ...

  5. Memory usage of a Java process java Xms Xmx Xmn

    http://www.oracle.com/technetwork/java/javase/memleaks-137499.html 3.1 Meaning of OutOfMemoryError O ...

  6. Shell script for logging cpu and memory usage of a Linux process

    Shell script for logging cpu and memory usage of a Linux process http://www.unix.com/shell-programmi ...

  7. 5 commands to check memory usage on Linux

    Memory Usage On linux, there are commands for almost everything, because the gui might not be always ...

  8. SHELL:Find Memory Usage In Linux (统计每个程序内存使用情况)

    转载一个shell统计linux系统中每个程序的内存使用情况,因为内存结构非常复杂,不一定100%精确,此shell可以在Ghub上下载. [root@db231 ~]# ./memstat.sh P ...

  9. Why does the memory usage increase when I redeploy a web application?

    That is because your web application has a memory leak. A common issue are "PermGen" memor ...

随机推荐

  1. Java之集合(十四)Hashtable

    转载请注明源出处:http://www.cnblogs.com/lighten/p/7426522.html 1.前言 HashTable这个类很奇特,其继承了Dictionary这个没有任何具体实现 ...

  2. django第三课 模版

    第一步 创建项目文件: django-admin.py startproject *** 第二步 进入该文件下创建文件夹templates,在该文件夹下创建thanks.html <!DOCTY ...

  3. Spring Security构建Rest服务-0400-使用切片拦截rest服务

    Restful API的拦截: 1,过滤器(Filter) 2,拦截器(Interceptor) 3,切片(Aspect) 1,过滤器 和传统javaweb一鸟样,例,记录controller执行时间 ...

  4. ActiveRecord::Fixture::FixtureError: table "users" has no column named "activated_at".

    window 7+ruby2.33+rails5.0. 在测试的时候 rails test 报固件fixture错误: 没有某列字段存在 虽然可以直接通过开发框架去修改字段,但是开发过程中应该通过迁移 ...

  5. j2ee高级开发技术课程第一周

    一.课程目标 这学期开始了J2EE高级开发技术这门课,在此之前我学习了javaSE,为这门课的学习打下了一定的基础.到这学期的结束我希望我能熟悉javaee,能开发企业级应用,对开发轻量级企业应用的主 ...

  6. freeSWITCH之多平台测试通信

    开始测试使用 强烈建议在统一的局域网下进行配置,通信 本机IP:192.168.1.155 架构 freeSWITCH搭建在以Windows平台作为通信服务器.fs_cli为服务器上测试客户端. X- ...

  7. centos7防暴力破解五种方法

    什么是暴力破解,简单来说就是对一个服务器进行无数次尝试登陆,并用不同的密码进行登陆直到可以登陆成功.暴力破解的基本步骤可以分为以下几步: 1. 找到对应的linux服务器    Ip地址 2.扫描端口 ...

  8. 从cpu负载到jstack分析线程状态

    示例代码: public class CPULockTest { private static Object lock1 = new Object(); private static Object l ...

  9. NewBluePill源码学习 <一>

    NewBluePill的源码也看的差不多了,一直说等有时间了再写学习的一些心得,拖来拖去弄到现在了,时间不是等来的,慢慢开始吧. 0x00     初识硬件虚拟化 硬件虚拟化对大数人来讲还是比较陌生. ...

  10. spring之mvc原理分析及简单模拟实现

    在之前的一篇博客中已经简单的实现了spring的IOC和DI功能,本文将在之前的基础上实现mvc功能. 一 什么是MVC MVC简单的说就是一种软件实现的设计模式,将整个系统进行分层,M(model ...