【tf.keras】Resource exhausted: OOM when allocating tensor with shape [9216,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0

运行以下类似代码：

while True:

    inputs, outputs = get_AlexNet()

    model = tf.keras.Model(inputs=inputs, outputs=outputs)

    model.summary()

    adam_opt = tf.keras.optimizers.Adam(learning_rate)

    # The compile step specifies the training configuration.

    model.compile(optimizer=adam_opt, loss='categorical_crossentropy', metrics=['accuracy'])

    # load weights from h5 file

    model.load_weights('alexnet_weights.h5')

最后会报错：

OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[9216,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

解决办法：

from keras import backend as K

K.clear_session()

如：

from keras import backend as K

while True:

    # 清空之前model占用的内存，防止OOM

    K.clear_session()

    inputs, outputs = get_AlexNet()

    model = tf.keras.Model(inputs=inputs, outputs=outputs)

    model.summary()

    adam_opt = tf.keras.optimizers.Adam(learning_rate)

    # The compile step specifies the training configuration.

    model.compile(optimizer=adam_opt, loss='categorical_crossentropy', metrics=['accuracy'])

    # load weights from h5 file

    model.load_weights('alexnet_weights.h5')

详细报错如下：

2019-06-03 21:54:24.789150: W T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 144.00MiB.  Current allocation summary follows.

2019-06-03 21:54:24.804684: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (256): 	Total Chunks: 243, Chunks in use: 243. 60.8KiB allocated for chunks. 60.8KiB in use in bin. 6.6KiB client-requested in use in bin.

2019-06-03 21:54:24.813190: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (512): 	Total Chunks: 19, Chunks in use: 19. 14.3KiB allocated for chunks. 14.3KiB in use in bin. 14.3KiB client-requested in use in bin.

2019-06-03 21:54:24.841197: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (1024): 	Total Chunks: 52, Chunks in use: 52. 62.5KiB allocated for chunks. 62.5KiB in use in bin. 60.6KiB client-requested in use in bin.

2019-06-03 21:54:24.843308: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (2048): 	Total Chunks: 2, Chunks in use: 2. 5.0KiB allocated for chunks. 5.0KiB in use in bin. 3.0KiB client-requested in use in bin.

2019-06-03 21:54:24.844847: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (4096): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.

2019-06-03 21:54:24.846267: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (8192): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.

2019-06-03 21:54:24.848125: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (16384): 	Total Chunks: 31, Chunks in use: 31. 511.0KiB allocated for chunks. 511.0KiB in use in bin. 496.0KiB client-requested in use in bin.

2019-06-03 21:54:24.849356: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (32768): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.

2019-06-03 21:54:24.850511: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (65536): 	Total Chunks: 16, Chunks in use: 16. 1.43MiB allocated for chunks. 1.43MiB in use in bin. 1.42MiB client-requested in use in bin.

2019-06-03 21:54:24.852015: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (131072): 	Total Chunks: 23, Chunks in use: 23. 3.72MiB allocated for chunks. 3.72MiB in use in bin. 3.46MiB client-requested in use in bin.

2019-06-03 21:54:24.863147: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (262144): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.

2019-06-03 21:54:24.864633: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (524288): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.

2019-06-03 21:54:24.865992: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (1048576): 	Total Chunks: 17, Chunks in use: 17. 21.15MiB allocated for chunks. 21.15MiB in use in bin. 19.92MiB client-requested in use in bin.

2019-06-03 21:54:24.867384: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (2097152): 	Total Chunks: 52, Chunks in use: 52. 144.75MiB allocated for chunks. 144.75MiB in use in bin. 137.86MiB client-requested in use in bin.

2019-06-03 21:54:24.868803: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (4194304): 	Total Chunks: 3, Chunks in use: 3. 17.16MiB allocated for chunks. 17.16MiB in use in bin. 10.13MiB client-requested in use in bin.

2019-06-03 21:54:24.870144: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (8388608): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.

2019-06-03 21:54:24.871061: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (16777216): 	Total Chunks: 3, Chunks in use: 2. 62.97MiB allocated for chunks. 42.20MiB in use in bin. 37.19MiB client-requested in use in bin.

2019-06-03 21:54:24.871849: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (33554432): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.

2019-06-03 21:54:24.874994: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (67108864): 	Total Chunks: 21, Chunks in use: 21. 1.40GiB allocated for chunks. 1.40GiB in use in bin. 1.31GiB client-requested in use in bin.

2019-06-03 21:54:24.875718: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (134217728): 	Total Chunks: 20, Chunks in use: 20. 2.98GiB allocated for chunks. 2.98GiB in use in bin. 2.81GiB client-requested in use in bin.

2019-06-03 21:54:24.876800: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (268435456): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.

2019-06-03 21:54:24.877455: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:646] Bin for 144.00MiB was 128.00MiB, Chunk State:

2019-06-03 21:54:24.877906: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:665] Chunk at 0000000B03E00000 of size 1280

2019-06-03 21:54:24.878316: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:665] Chunk at 0000000B03E00500 of size 256

2019-06-03 21:54:24.879415: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:665] Chunk at 0000000B03E00600 of size 256

2019-06-03 21:54:24.879816: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:665] Chunk at 0000000B03E00700 of size 256

...

2019-06-03 21:54:24.998647: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 1 Chunks of size 256733696 totalling 244.84MiB

2019-06-03 21:54:24.998857: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:678] Sum Total of in-use chunks: 4.60GiB

2019-06-03 21:54:24.999076: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:680] Stats:

Limit:                  4965636505

InUse:                  4943860224

MaxInUse:               4943860224

NumAllocs:                 2362778

MaxAllocSize:            516972544

2019-06-03 21:54:24.999520: W T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:279] ********x************************************************************************x*****************x

2019-06-03 21:54:25.001526: W T:\src\github\tensorflow\tensorflow\core\framework\op_kernel.cc:1275] OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[9216,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

2019-06-03 21:54:25.108672: W T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 372.96MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

2019-06-03 21:54:25.129713: W T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 482.40MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

2019-06-03 21:54:25.145367: W T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 331.52MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

Traceback (most recent call last):

  File "E:/PycharmProjects/ActiveLearning/AlexNet_AL.py", line 156, in <module>

    validation_data=(x_val, y_val))

  File "C:\taotao\Python\Python36\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1363, in fit

    validation_steps=validation_steps)

  File "C:\taotao\Python\Python36\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py", line 264, in fit_loop

    outs = f(ins_batch)

  File "C:\taotao\Python\Python36\lib\site-packages\tensorflow\python\keras\backend.py", line 2914, in __call__

    fetched = self._callable_fn(*array_vals)

  File "C:\taotao\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1382, in __call__

    run_metadata_ptr)

  File "C:\taotao\Python\Python36\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 519, in __exit__

    c_api.TF_GetCode(self.status.status))

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[9216,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

	 [[Node: training_4/Adam/gradients/dense/kernel/Regularizer/Square_grad/Mul_1 = Mul[T=DT_FLOAT, _class=["loc:@training_4/Adam/gradients/AddN_5"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](dense/kernel/Regularizer/Square/ReadVariableOp, training_4/Adam/gradients/dense/kernel/Regularizer/Square_grad/Mul)]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[Node: metrics_4/acc/Mean/_1023 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1423_metrics_4/acc/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

References

Keras解决OOM超内存问题 -- silent56_th

Keras 循环训练模型跑数据时内存泄漏的问题解决办法 -- jemmie_w

【tf.keras】Resource exhausted: OOM when allocating tensor with shape [9216,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc的更多相关文章

tensorflow报错 tensorflow Resource exhausted: OOM when allocating tensor with shape
在使用tensorflow的object detection时,出现以下报错 tensorflow Resource exhausted: OOM when allocating tensor wit ...
Resource exhausted: OOM when allocating tensor with shape[3,3,384,384] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0。。。。。
报错信息: OP_REQUIRES failed at assign_op.h:111 : Resource exhausted: OOM when allocating tensor with sh ...
OP_REQUIRES failed at conv_ops.cc:386 : Resource exhausted: OOM when allocating tensor with shape..
tensorflow-gpu验证准确率是报错如上: 解决办法: 1. 加入os.environ['CUDA_VISIBLE_DEVICES']='2' 强制使用CPU验证-----慢 2.'batch ...
【tf.keras】使用手册
目录 0. 简介 1. 安装 1.1 安装 CUDA 和 cuDNN 2. 数据集 2.1 使用 tensorflow_datasets 导入公共数据集 2.2 数据集过大导致内存溢出 2.3 加载 ...
【tf.keras】tf.keras使用tensorflow中定义的optimizer
Update:2019/09/21 使用 tf.keras 时,请使用 tf.keras.optimizers 里面的优化器,不要使用 tf.train 里面的优化器,不然学习率衰减会出现问题. 使用 ...
显存不够----ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4096]
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4096] 类似问题 h ...
【tf.keras】在 cifar 上训练 AlexNet，数据集过大导致 OOM
cifar-10 每张图片的大小为 32×32,而 AlexNet 要求图片的输入是 224×224(也有说 227×227 的,这是 224×224 的图片进行大小为 2 的 zero paddin ...
【tf.keras】tf.keras加载AlexNet预训练模型
目录从 PyTorch 中导出模型参数第 0 步:配置环境第 1 步:安装 MMdnn 第 2 步:得到 PyTorch 保存完整结构和参数的模型(pth 文件) 第 3 步:导出 PyTorc ...
【tf.keras】实现 F1 score、precision、recall 等 metric
tf.keras.metric 里面竟然没有实现 F1 score.recall.precision 等指标,一开始觉得真不可思议.但这是有原因的,这些指标在 batch-wise 上计算都没有意义, ...

随机推荐

Mysql - 高可用方案之MMM(二)
一.概述上一篇博客中(https://www.cnblogs.com/ddzj01/p/11535796.html)介绍了如何搭建MMM架构,本文将通过实验介绍MMM架构的优缺点. 二.优点 1. ...
耐人寻味的CSS属性font-family
font-family是一个网站用户体验的第一入口,非常有必要花功夫来研究一下.我们首先需要了解衬线字体和无衬线字体,接着了解中英文的常用字体及其适用性. 衬线字体衬线(serif)的笔画有粗有细的 ...
centos安装go环境
centos安装go环境 1,下载合适的go安装包 https://studygolang.com/dl 2 上传到 centos服务器的 /usr/local下然后解压 3.设置go的环境变量 ...
Spring 框架下的 JDBC
Spring JDBC Spring 对JDBC技术规范做了进一步封装,它又叫Spring JDBCTemplate(jdbc模板技术) 纯JDBC:代码清晰的.效率最高.代码是最烦的. Spr ...
Linux笔记16 使用Vsftpd服务传输文件;使用Samba或NFS实现文件共享。
FTP协议有下面两种工作模式. 1.主动模式:FTP服务器主动向客户端发起连接请求. 2.被动模式:FTP服务器等待客户端发起连接请求(FTP的默认工作模式).Vsftpd服务程序vsftpd作为更加 ...
IDEA 工具自动生成JavaBean类
1.先安装GsonFormat插件:File-->Setting-->Plugins-->GsonFormat-->OK 2.new 一个新的Class空文件,然后 Alt+I ...
【tf.keras】TensorFlow 1.x 到 2.0 的 API 变化
TensorFlow 2.0 版本将 keras 作为高级 API,对于 keras boy/girl 来说,这就很友好了.tf.keras 从 1.x 版本迁移到 2.0 版本,需要修改几个地方. ...
tf.InteractiveSession()与tf.Session()的区别
Tensorflow依赖于一个高效的C++后端来进行计算.与后端的这个连接叫做session.一般而言,使用TensorFlow程序的流程是先创建一个图,然后在session中启动它. 这里,我们使用 ...
Django回顾--配置文件
""" Django settings for meiduo_mall project. Generated by 'django-admin startproject' ...
埃氏筛法(求n以内有哪些个质数)
核心思想:从i=2开始,划去i的倍数,即剩下i为质数(如删去2的倍数后2为质数,再删去3的倍数后3为质数,4被删除则跳过,5未被删除则记录然后删除5的倍数...以此类推) #include <b ...

【tf.keras】Resource exhausted: OOM when allocating tensor with shape [9216,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

References

【tf.keras】Resource exhausted: OOM when allocating tensor with shape [9216,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc的更多相关文章

随机推荐

热门专题