在这项作业中，将实现语言网络，并将其应用于 COCO 数据集上的图像标题。然后将训练生成对抗网络，生成与训练数据集相似的图像。最后，您将学习自我监督学习，自动学习无标签数据集的视觉表示。
本作业的目标如下：
1.理解并实现 RNN 和 Transformer 网络。将它们与 CNN 网络相结合，为图像添加标题。
2.了解如何训练和实现生成对抗网络（GAN），以生成与数据集中的样本相似的图像。
3.了解如何利用自我监督学习技术来帮助完成图像分类任务。

Q1: Network Visualization: Saliency Maps, Class Visualization, and Fooling Images

The notebook Network_Visualization.ipynb will introduce the pretrained SqueezeNet model, compute gradients with respect to images, and use them to produce saliency maps and fooling images.

我们将探索使用image gradients来生成新图像。

在训练模型时，我们定义了一个损失函数来衡量我们目前对模型表现的不满。然后我们使用反向传播来计算损耗相对于模型参数的梯度，并对模型参数进行梯度下降以使损耗最小化。
在这里，我们将做一些略有不同的事情。我们将从一个CNN模型开始，该模型已经过预训练，可以在ImageNet数据集上执行图像分类。我们将使用这个模型来定义一个损失函数，它量化了我们当前对自己图像的不满。然后，我们将使用反向传播来计算这种损失相对于图像像素的梯度。然后，我们将保持模型不变，并对图像执行梯度下降，以合成一幅最大限度减少损失的新图像。
我们将探索三种图像生成技术：
显著图Saliency Maps：可以使用显著图来判断图像的哪一部分影响了网络做出的分类决策。需要完成cs231n/net_visualization_pytorch.py当中的compute_saliency_maps函数

 1 def compute_saliency_maps(X, y, model):

 2     """

 3     Compute a class saliency map using the model for images X and labels y.

 4

 5     Input:

 6     - X: Input images; Tensor of shape (N, 3, H, W)

 7     - y: Labels for X; LongTensor of shape (N,)

 8     - model: A pretrained CNN that will be used to compute the saliency map.

 9

10     Returns:

11     - saliency: A Tensor of shape (N, H, W) giving the saliency maps for the input

12     images.

13     """

14     # Make sure the model is in "test" mode

15     # "test" mode下，模型通常会关闭一些训练中使用的特定层（如 dropout 或 batch normalization）

16     model.eval()

17

18     # Make input tensor require gradient

19     # 告诉 PyTorch 在计算模型的前向传播时跟踪输入图像 X 的梯度。

20     X.requires_grad_()

21

22     saliency = None

23     ##############################################################################

24     # TODO: Implement this function. Perform a forward and backward pass through #

25     # the model to compute the gradient of the correct class score with respect  #

26     # to each input image. You first want to compute the loss over the correct   #

27     # scores (we'll combine losses across a batch by summing), and then compute  #

28     # the gradients with a backward pass.                                        #

29     ##############################################################################

30     # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

31

32     # 执行Forward pass

33     scores = model(X)

34     # y.view(-1, 1) 将标签 y 变形为列向量

35     # 使用 gather(1,) 从 scores 中选择正确类别的分数

36     correct_scores = scores.gather(1, y.view(-1, 1))

37     # Compute loss

38     # 使用负的正确类别分数的和作为损失，因为我们想要最大化这个分数。

39     loss = -correct_scores.sum()

40

41     # 执行Backward pass

42     loss.backward()

43     # Compute the saliency map

44     # 梯度.绝对值.三通道当中的最大值，dim=1即对应(N, 3, H, W)的3

45     # 注意若没有[0]，则第一个张量是最大值的张量，第二个张量是对应最大值的索引。

46     saliency = X.grad.abs().max(dim=1)[0]

47

48     # Clear gradients for next iteration

49     # 清零输入图像的梯度，以确保下一次迭代时梯度不会累积。

50     X.grad.data.zero_()

51

52     # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

53     ##############################################################################

54     #                             END OF YOUR CODE                               #

55     ##############################################################################

56     return saliency

Fooling Images：可以干扰输入图像，使其在人类看来是相同的，但会被预先训练的网络错误分类。需要完成cs231n/net_visualization_pytorch.py当中的make_fooling_image函数

 1 def make_fooling_image(X, target_y, model):

 2     """

 3     Generate a fooling image that is close to X, but that the model classifies

 4     as target_y.

 5

 6     Inputs:

 7     - X: Input image; Tensor of shape (1, 3, 224, 224)

 8     - target_y: An integer in the range [0, 1000)

 9     - model: A pretrained CNN

10

11     Returns:

12     - X_fooling: An image that is close to X, but that is classifed as target_y

13     by the model.

14     """

15     # Initialize our fooling image to the input image, and make it require gradient

16     X_fooling = X.clone()

17     X_fooling = X_fooling.requires_grad_()

18

19     learning_rate = 1

20     ##############################################################################

21     # TODO: Generate a fooling image X_fooling that the model will classify as   #

22     # the class target_y. You should perform gradient ascent on the score of the #

23     # target class, stopping when the model is fooled.                           #

24     # When computing an update step, first normalize the gradient:               #

25     #   dX = learning_rate * g / ||g||_2                                         #

26     #                                                                            #

27     # You should write a training loop.                                          #

28     #                                                                            #

29     # HINT: For most examples, you should be able to generate a fooling image    #

30     # in fewer than 100 iterations of gradient ascent.                           #

31     # You can print your progress over iterations to check your algorithm.       #

32     ##############################################################################

33     # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

34

35     # Set the model to evaluation mode

36     model.eval()

37

38     # Define a criterion (loss function) to maximize the target class score

39     # criterion = torch.nn.CrossEntropyLoss()

40

41     #初始分类

42     scores = model(X_fooling)

43

44     _, y_predit = scores.max(dim = 1)

45

46     iter = 0

47

48     while(y_predit != target_y):

49         iter += 1

50

51         target_score = scores[0, target_y]

52         target_score.backward()

53         grad = X_fooling.grad / X_fooling.grad.norm()

54         X_fooling.data += learning_rate * grad

55

56         X_fooling.grad.zero_()

57

58         model.zero_rgrad()

59

60         scores = model(X_fooling)

61         _,y_predit=scores.max(dim = 1)

62

63     print("Iteration Count: %d"% iter)

64

65

66     # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

67     ##############################################################################

68     #                             END OF YOUR CODE                               #

69     ##############################################################################

70     return X_fooling

例如idx = 1，target_y = 6，输出为

类别可视化Class Visualization：可以合成图像以最大化特定类别的分类分数；这可以让我们对网络在对该类别的图像进行分类时寻找什么有所了解。

 1 def class_visualization_update_step(img, model, target_y, l2_reg, learning_rate):

 2     ########################################################################

 3     # TODO: Use the model to compute the gradient of the score for the     #

 4     # class target_y with respect to the pixels of the image, and make a   #

 5     # gradient step on the image using the learning rate. Don't forget the #

 6     # L2 regularization term!                                              #

 7     # Be very careful about the signs of elements in your code.            #

 8     ########################################################################

 9     # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

10

11     score = model(img)

12     target_score = score[0,target_y]

13     target_score.backward()

14

15     im_grad = img.grad - l2_reg * img

16     grad = im_grad / im_grad.norm()

17     img.data += learning_rate * grad

18     img.grad.zero_()

19     model.zero_grad()

20

21     # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

22     ########################################################################

23     #                             END OF YOUR CODE                         #

24     ########################################################################

25

26 def create_class_visualization(target_y, model, dtype, **kwargs):

27     """

28     Generate an image to maximize the score of target_y under a pretrained model.

29

30     Inputs:

31     - target_y: Integer in the range [0, 1000) giving the index of the class

32     - model: A pretrained CNN that will be used to generate the image

33     - dtype: Torch datatype to use for computations

34

35     Keyword arguments:

36     - l2_reg: Strength of L2 regularization on the image

37     - learning_rate: How big of a step to take

38     - num_iterations: How many iterations to use

39     - blur_every: How often to blur the image as an implicit regularizer

40     - max_jitter: How much to gjitter the image as an implicit regularizer

41     - show_every: How often to show the intermediate result

42     """

43     model.type(dtype)

44

45     l2_reg = kwargs.pop('l2_reg', 1e-3)

46     learning_rate = kwargs.pop('learning_rate', 25)

47     num_iterations = kwargs.pop('num_iterations', 100)

48     blur_every = kwargs.pop('blur_every', 10)

49     max_jitter = kwargs.pop('max_jitter', 16)

50     show_every = kwargs.pop('show_every', 25)

51

52     # 随机初始化生成图像，其数据类型为 dtype，并标记为需要梯度计算

53     img = torch.randn(1, 3, 224, 224).mul_(1.0).type(dtype).requires_grad_()

54

55     for t in range(num_iterations):

56         # 在每次迭代中，随机对图像进行轻微的抖动，这是为了得到稍微更好的结果。

57         ox, oy = random.randint(0, max_jitter), random.randint(0, max_jitter)

58         img.data.copy_(jitter(img.data, ox, oy))

59         class_visualization_update_step(img, model, target_y, l2_reg, learning_rate)

60         # 恢复图像，撤销之前的抖动。

61         img.data.copy_(jitter(img.data, -ox, -oy))

62

63         # As regularizer, clamp and periodically blur the image

64         for c in range(3):

65             lo = float(-SQUEEZENET_MEAN[c] / SQUEEZENET_STD[c])

66             hi = float((1.0 - SQUEEZENET_MEAN[c]) / SQUEEZENET_STD[c])

67             img.data[:, c].clamp_(min=lo, max=hi)

68         # 每隔一定的迭代次数对图像进行模糊处理

69         if t % blur_every == 0:

70             blur_image(img.data, sigma=0.5)

71

72         # Periodically show the image

73         if t == 0 or (t + 1) % show_every == 0 or t == num_iterations - 1:

74             plt.imshow(deprocess(img.data.clone().cpu()))

75             class_name = class_names[target_y]

76             plt.title('%s\nIteration %d / %d' % (class_name, t + 1, num_iterations))

77             plt.gcf().set_size_inches(4, 4)

78             plt.axis('off')

79             plt.show()

80

81     return deprocess(img.data.cpu())

net_visualizaiton_pytorch.py还附上了抖动之类的函数

 1 def jitter(X, ox, oy):

 2     """

 3     Helper function to randomly jitter an image.

 4

 5     Inputs

 6     - X: PyTorch Tensor of shape (N, C, H, W)

 7     - ox, oy: Integers giving number of pixels to jitter along W and H axes

 8

 9     Returns: A new PyTorch Tensor of shape (N, C, H, W)

10     """

11     # 如果 ox 不等于零，表示需要在图像的水平方向进行抖动，

12     # 将图像沿着水平方向切分成两部分，然后将右侧的部分移动到左侧。

13     if ox != 0:

14         left = X[:, :, :, :-ox]

15         right = X[:, :, :, -ox:]

16         X = torch.cat([right, left], dim=3)

17     # 如果 oy 不等于零，表示需要在图像的垂直方向进行抖动，

18     # 将图像沿着垂直方向切分成两部分，然后将上侧的部分移动到下侧。

19     if oy != 0:

20         top = X[:, :, :-oy]

21         bottom = X[:, :, -oy:]

22         X = torch.cat([bottom, top], dim=2)

23     return X

高斯模糊

1 def blur_image(X, sigma=1):

2     X_np = X.cpu().clone().numpy() # 转换成numpy数组

3     X_np = gaussian_filter1d(X_np, sigma, axis=2) # 水平方向

4     X_np = gaussian_filter1d(X_np, sigma, axis=3) # 垂直方向

5     X.copy_(torch.Tensor(X_np).type_as(X)) # 转换为pytorch张量

6     return X

Q2: Image Captioning with Vanilla RNNs

The notebook RNN_Captioning.ipynb will walk you through the implementation of vanilla recurrent neural networks and apply them to image captioning on COCO.

Network_Visualization.ipynb 笔记本将引入预训练的 SqueezeNet 模型，计算与图像相关的梯度，并利用这些梯度生成显著性地图和傻瓜图像。

Q3: Image Captioning with Transformers

The notebook Transformer_Captioning.ipynb will walk you through the implementation of a Transformer model and apply it to image captioning on COCO.

Q4: Generative Adversarial Networks

In the notebook Generative_Adversarial_Networks.ipynb you will learn how to generate images that match a training dataset and use these models to improve classifier performance when training on a large amount of unlabeled data and a small amount of labeled data. When first opening the notebook, go to Runtime > Change runtime type and set Hardware accelerator to GPU.

Q5: Self-Supervised Learning for Image Classification

In the notebook Self_Supervised_Learning.ipynb, you will learn how to leverage self-supervised pretraining to obtain better performance on image classification tasks. When first opening the notebook, go to Runtime > Change runtime type and set Hardware accelerator to GPU.

Extra Credit: Image Captioning with LSTMs

The notebook LSTM_Captioning.ipynb will walk you through the implementation of Long-Short Term Memory (LSTM) RNNs and apply them to image captioning on COCO.

参考：https://github.com/hanlulu1998/CS231n/tree/master/assignment3

CS231N Assignment3 笔记（更新中）的更多相关文章

SQL手工注入入门级笔记(更新中)
一.字符型注入针对如下php代码进行注入: $sql="select user_name from users where name='$_GET['name']'"; 正常访问 ...
Java编程思想—读书笔记(更新中)
第1章对象导论 1.4 被隐藏的具体实现访问控制的原因: 让客户端程序员无法触及他们不应该触及的部分(不是用户解决特定问题所需的接口的一部分) 允许库设计者可以改变类内容的工作方式而不用担心会影响 ...
Python3学习笔记-更新中
1.Python概况 2.Anaconda安装及使用 3.Pycharm安装及使用 4.Hello World!!! 5.数据类型及类型转换 6.分支结构 7.循环语句 8.异常
微信小程序练习笔记（更新中。。。）
微信小程序练习笔记微信小程序的练习笔记,用来整理思路的,文档持续更新中... 案例一:实现行的删除和增加操作 test.js // 当我们在特定方法中创建对象或者定义变量给与初始值的时候,它是局部 ...
笔记：认识 head 标签待更新中……
文档的头部描述了文档的各种属性和信息,包括文档的标题等.绝大多数文档头部包含的数据都不会真正作为内容显示给读者. 下面这些标签可用在 head 部分: <head> <title&g ...
CS231n课程笔记翻译9：卷积神经网络笔记
译者注:本文翻译自斯坦福CS231n课程笔记ConvNet notes,由课程教师Andrej Karpathy授权进行翻译.本篇教程由杜客和猴子翻译完成,堃堃和李艺颖进行校对修改. 原文如下内容列 ...
CS231n课程笔记翻译8：神经网络笔记 part3
译者注:本文智能单元首发,译自斯坦福CS231n课程笔记Neural Nets notes 3,课程教师Andrej Karpathy授权翻译.本篇教程由杜客翻译完成,堃堃和巩子嘉进行校对修改.译文含 ...
CS231n课程笔记翻译7：神经网络笔记 part2
译者注:本文智能单元首发,译自斯坦福CS231n课程笔记Neural Nets notes 2,课程教师Andrej Karpathy授权翻译.本篇教程由杜客翻译完成,堃堃进行校对修改.译文含公式和代 ...
CS231n课程笔记翻译6：神经网络笔记 part1
译者注:本文智能单元首发,译自斯坦福CS231n课程笔记Neural Nets notes 1,课程教师Andrej Karpathy授权翻译.本篇教程由杜客翻译完成,巩子嘉和堃堃进行校对修改.译文含 ...
CS231n课程笔记翻译5：反向传播笔记
译者注:本文智能单元首发,译自斯坦福CS231n课程笔记Backprop Note,课程教师Andrej Karpathy授权翻译.本篇教程由杜客翻译完成,堃堃和巩子嘉进行校对修改.译文含公式和代码, ...

随机推荐

想了解Xtrabackup备份原理和常见问题分析，看这篇就够了
摘要:本文来自华为云MySQL研发团队,主要分享了MySQL备份工具Xtrabackup的备份过程.华为云数据库团队对其做的优化改进,以及在使用中可能遇到的问题与解决方法. 本文分享自华为云社区< ...
列存Delta表是个什么东东
摘要:本文从delta表的概念.来历.用法.开启后的影响,delta表数据转移到主表几个方面做了详细的介绍. 本文分享自华为云社区<GaussDB(DWS) 列存delta表的简单介绍>, ...
火山引擎 DataLeap 套件下构建数据目录（Data Catalog）系统的实践
更多技术交流.求职机会,欢迎关注字节跳动数据平台微信公众号,回复[1]进入官方交流群摘要 Data Catalog 产品,通过汇总技术和业务元数据,解决大数据生产者组织梳理数据.数据消费者找数和理解 ...
PPT 工作需求：如何利用PPT做活动海报&H5？
PPT 工作需求:如何利用PPT做活动海报? 图片素材 + 字体 + 封面排版 PPT 工作需求:如何利用PPT制作H5? https://www.maka.im/muban http://www.p ...
Docker 安装 Elasticsearch、Kibana
为了Skywalking 准备 elasticsearch 至少需要2G内存 docker pull elasticsearch:7.9.3 docker run --name elasticsea ...
ME21N 采购订单新增页签增强
1.实现效果根据客制化需求,要在采购订单中新增大量字段,所以要在界面上添加一个单独的页签.效果如下: 2.增强实现 2.1.增强结构因为是在抬头上边添加,所以增强CI_EKKODB结构 2.2.函 ...
为什么加了@Transactional注解，事务没有回滚？
在昨天的<事务管理入门>一文发布之后,有读者联系说根据文章尝试,加了@Transactional注解之后,事务并没有回滚.经过一顿沟通排查之后,找到了原因,在此记录一下,给后面如果碰到类似 ...
《深入理解计算机系统》（CSAPP）实验四 —— Attack Lab
这是CSAPP的第四个实验,这个实验比较有意思,也比较难.通过这个实验我们可以更加熟悉GDB的使用和机器代码的栈和参数传递机制. @ 目录实验目的准备工作内容简介代码注入攻击 Level 1 ...
Liunx快捷命令(别名)与快捷方式(软/硬链接)
一.快捷命令(别名)-临时生效1.命令:alias 别名='原命令' 2.举例:给检查防火墙的命令设置别名 [root@localhost ~]# alias fhq='firewall-cmd -- ...
Pgsql之查询一段时间内的所有日期
前几天干活儿的时候,项目中有这么个需求,需要用pgsql查询两个日期间的所有日期,包括年月日,下面贴代码: 1 select date(t) as day 2 from 3 generate_seri ...

CS231N Assignment3 笔记（更新中）