问题

在用pytorch跑生成对抗网络的时候，出现错误Runtime Error: one of the variables needed for gradient computation has been modified by an inplace operation，特记录排坑记录。

环境

windows10 2004

python 3.7.4

pytorch 1.7.0 + cpu

解决过程

尝试一

这段错误代码看上去不难理解，意思为：计算梯度所需的某变量已被一就地操作修改。什么是就地操作呢，举个例子如x += 1就是典型的就地操作，可将其改为y = x + 1。但很遗憾，这样并没有解决我的问题，这种方法的介绍如下。

在网上搜了很多相关博客，大多原因如下：

由于0.4.0把Varible和Tensor融合为一个Tensor，inplace操作，之前对Varible能用，但现在对Tensor，就会出错了。

所以解决方案很简单：将所有inplace操作转换为非inplace操作。如将x += 1换为y = x + 1。

仍然有一个问题，即如何找到inplace操作，这里提供一个小trick：分阶段调用y.backward()，若报错，则说明这之前有问题；反之则说明错误在该行之后。

尝试二

在我的代码里根本就没有找到任何inplace操作，因此上面这种方法行不通。自己盯着代码，debug，啥也看不出来，好久......

忽然有了新idea。我的训练阶段的代码如下：

for epoch in range(1, epochs + 1):

    for idx, (lr, hr) in enumerate(traindata_loader):

        lrs = lr.to(device)

        hrs = hr.to(device)

        # update the discriminator

        netD.zero_grad()

        logits_fake = netD(netG(lrs).detach())

        logits_real = netD(hrs)

        # Label smoothing

        real = (torch.rand(logits_real.size()) * 0.25 + 0.85).clone().detach().to(device)

        fake = (torch.rand(logits_fake.size()) * 0.15).clone().detach().to(device)

        d_loss = bce(logits_real, real) + bce(logits_fake, fake)

        d_loss.backward(retain_graph=True)

        optimizerD.step()

        # update the generator

        netG.zero_grad()

        # ！！！问题出错行

        g_loss = contentLoss(netG(lrs), hrs) + adversarialLoss(logits_fake)

        g_loss.backward()

        optimizerG.step()

判别器loss的backward是正常的，生成器loss的backward有问题。观察到g_loss由两项组成，所以很自然的想法就是删掉其中一项看是否正常。结果为：只保留第一项程序正常运行；g_loss中包含第二项程序就出错。

因此去看了adversarialLoss的代码：

class AdversarialLoss(nn.Module):

    def __init__(self):

        super(AdversarialLoss, self).__init__()

        self.bec_loss = nn.BCELoss()

    def forward(self, logits_fake):

        # Adversarial Loss

        # !!! 问题在这，logits_fake加上detach后就可以正常运行

        adversarial_loss = self.bec_loss(logits_fake, torch.ones_like(logits_fake))

        return 0.001 * adversarial_loss

看不出来任何问题，只能挨个试。这里只有两个变量：logits_fake和torch.ones_like(logits_fake)。后者为常量，所以试着固定logits_fake，不让其参与训练，程序竟能运行了！

class AdversarialLoss(nn.Module):

    def __init__(self):

        super(AdversarialLoss, self).__init__()

        self.bec_loss = nn.BCELoss()

    def forward(self, logits_fake):

        # Adversarial Loss

        # !!! 问题在这，logits_fake加上detach后就可以正常运行

        adversarial_loss = self.bec_loss(logits_fake.detach(), torch.ones_like(logits_fake))

        return 0.001 * adversarial_loss

由此知道了被修改的变量是logits_fake。尽管程序可以运行了，但这样做不一定合理。类AdversarialLoss中没有对logits_fake进行修改，所以返回刚才的训练程序中。

for epoch in range(1, epochs + 1):

    for idx, (lr, hr) in enumerate(traindata_loader):

        lrs = lr.to(device)

        hrs = hr.to(device)

        # update the discriminator

        netD.zero_grad()

        logits_fake = netD(netG(lrs).detach())

        logits_real = netD(hrs)

        # Label smoothing

        real = (torch.rand(logits_real.size()) * 0.25 + 0.85).clone().detach().to(device)

        fake = (torch.rand(logits_fake.size()) * 0.15).clone().detach().to(device)

        d_loss = bce(logits_real, real) + bce(logits_fake, fake)

        d_loss.backward(retain_graph=True)

        # 这里进行的更新操作

        optimizerD.step()

        # update the generator

        netG.zero_grad()

        # ！！！问题出错行

        g_loss = contentLoss(netG(lrs), hrs) + adversarialLoss(logits_fake)

        g_loss.backward()

        optimizerG.step()

注意到Discriminator在出错行之前进行了更新操作，因此真相呼之欲出————optimizerD.step()对logits_fake进行了修改。直接将其挪到倒数第二行即可，修改后代码为：

for epoch in range(1, epochs + 1):

    for idx, (lr, hr) in enumerate(traindata_loader):

        lrs = lr.to(device)

        hrs = hr.to(device)

        # update the discriminator

        netD.zero_grad()

        logits_fake = netD(netG(lrs).detach())

        logits_real = netD(hrs)

        # Label smoothing

        real = (torch.rand(logits_real.size()) * 0.25 + 0.85).clone().detach().to(device)

        fake = (torch.rand(logits_fake.size()) * 0.15).clone().detach().to(device)

        d_loss = bce(logits_real, real) + bce(logits_fake, fake)

        d_loss.backward(retain_graph=True)

        # update the generator

        netG.zero_grad()

        g_loss = contentLoss(netG(lrs), hrs) + adversarialLoss(logits_fake)

        g_loss.backward()

        optimizerD.step()

        optimizerG.step()

程序终于正常运行了，耶( •̀ ω •́ )y！

总结

原因：在计算生成器网络梯度之前先对判别器进行更新，修改了某些值，导致Generator网络的梯度计算失败。

解决方法：将Discriminator的更新步骤放到Generator的梯度计算步骤后面。

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation的更多相关文章

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace
vgg里面的 ReLU默认的参数inplace=True 当我们调用vgg结构的时候注意要将inplace改成 False 不然会报错 RuntimeError: one of the variab ...
one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3, 1280, 28, 28]], which is output 0 of LeakyReluBackward1, is at version 2;
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace o ...
TensorFlow 学习（八）—— 梯度计算（gradient computation）
maxpooling 的 max 函数关于某变量的偏导也是分段的,关于它就是 1,不关于它就是 0: BP 是反向传播求关于参数的偏导,SGD 则是梯度更新,是优化算法: 1. 一个实例 relu = ...
pytorch .detach() .detach_() 和 .data用于切断反向传播
参考:https://pytorch-cn.readthedocs.io/zh/latest/package_references/torch-autograd/#detachsource 当我们再训 ...
PyTorch学习笔记及问题处理
1.torch.nn.state_dict(): 返回一个字典,保存着module的所有状态(state). parameters和persistent_buffers都会包含在字典中,字典的key就 ...
pytorch的自动求导机制 - 计算图的建立
一.计算图简介在pytorch的官网上,可以看到一个简单的计算图示意图, 如下. import torchfrom torch.autograd import Variable x = Variab ...
[源码解析]PyTorch如何实现前向传播(2) --- 基础类(下)
[源码解析]PyTorch如何实现前向传播(2) --- 基础类(下) 目录 [源码解析]PyTorch如何实现前向传播(2) --- 基础类(下) 0x00 摘要 0x01 前文回顾 0x02 Te ...
Coursera Deep Learning 2 Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization - week1, Assignment(Gradient Checking)
声明:所有内容来自coursera,作为个人学习笔记记录在这里. Gradient Checking Welcome to the final assignment for this week! In ...
课程二(Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization)，第一周（Practical aspects of Deep Learning） —— 4.Programming assignments:Gradient Checking
Gradient Checking Welcome to this week's third programming assignment! You will be implementing grad ...

随机推荐

Python-全局函数(内置方法、内置函数)
Python有很多内置方法,这些都全局可用 abs() 求数值的绝对值,如果是复数则返回其模 print(abs(-17), abs(30.2), abs(3+4j)) # Python中复数表示为 ...
坐标下降(Coordinate descent)
坐标下降法属于一种非梯度优化的方法,它在每步迭代中沿一个坐标的方向进行线性搜索(线性搜索是不需要求导数的),通过循环使用不同的坐标方法来达到目标函数的局部极小值.
Android 自定义Vie 对勾CheckBox
天在美团点外卖,有一个商品推荐的条目,上面的CheckBox是自定义的,虽然我们大部分都是用图片来自定义样式.但是还是可以自定义View来绘制的,只要画一个圆和对勾即可. 最终效果最终效果.png ...
实验 4:Open vSwitch 实验——Mininet 中使用 OVS 命令
一.安装目的 Mininet 安装之后,会连带安装 Open vSwitch,可以直接通过 Python 脚本调用Open vSwitch 命令,从而直接控制 Open vSwitch,通过实验了解调 ...
Java知识系统回顾整理01基础07类和对象01引用
一.引用的定义引用的概念,如果一个变量的类型是类类型,而非基本类型,那么该变量又叫做引用. 二.引用和指向 new Hero(); 代表创建了一个Hero对象但是也仅仅是创建了一个对象,没有办法 ...
C++对话框创建及修改对话框属性
转载:http://www.51testing.com/html/48/n-3151648.html 创建对话框 C++中对话框分为模式对话框和非模式对话框. 模式对话框的创建: MyDialog m ...
SSIS 生成文件
程序说明此SSIS的目标是生成如下的文本文件此文件的列由TAB键分割,可以使用notepad++来查看这样就能够看清TAB键了文件由%H%表示头部和%D%表示的细节部分以下为程序开发使用的V ...
git 上传文件到 gitee 码云远程仓库
一 , 想将码云仓库里面的代码,抓取下来 1.git remote add origin 地址 2. git remote -v 3. it pull origin master 二 , 将自己创建 ...
每日一题 LeetCode 42.接雨水【双指针】
题目链接 https://leetcode-cn.com/problems/trapping-rain-water/ 题目说明题解主要方法:双指针 + 正反遍历解释说明: 正向遍历:先确定池子左 ...
DX12龙书 02 - DirectXMath 库中与向量有关的类和函数
0x00 需要用到的头文件 #include <DirectXMath> #include <DirectXPackedVector.h> using namespace Di ...

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

问题

环境

解决过程

总结

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation的更多相关文章

随机推荐

热门专题