CS231N Assigenment1 two_layer

two_layer_net.ipynb

之前对 x.reshape(x.shape[0], -1)语句的输出结果理解一直有误：

1 x = [[1,4,7,2],[2,5,7,4]]

2 x = np.array(x)

3 x0 = x.reshape(x.shape[0], -1)

4 x1 = x.reshape(x.shape[1], -1)

5 print(x0)

6 print(x1)

的输出实际为

[[1 4 7 2]

 [2 5 7 4]]

[[1 4]

 [7 2]

 [2 5]

 [7 4]]

Affine layer: forward

# Test the affine_forward function

num_inputs = 2

input_shape = (4, 5, 6)

output_dim = 3

input_size = num_inputs * np.prod(input_shape)

#print(np.prod(input_shape)) #120

weight_size = output_dim * np.prod(input_shape)

x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)

#prin(t(*input_shape)# 4 5 6

print(np.shape(x))# (2, 4, 5, 6)

w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)

b = np.linspace(-0.3, 0.1, num=output_dim)

out, _ = affine_forward(x, w, b)

correct_out = np.array([[ 1.49834967,  1.70660132,  1.91485297],

                        [ 3.25553199,  3.5141327,   3.77273342]])

# Compare your output with ours. The error should be around e-9 or less.

print('Testing affine_forward function:')

print('difference: ', rel_error(out, correct_out))

要补充的函数为

def affine_forward(x, w, b):

    out = None

    x_vector = x.reshape(x.shape[0], -1)

    out = x_vector.dot(w)

    out += b

    return out, cache

Affine layer: backward

# Test the affine_forward function

num_inputs = 2

input_shape = (4, 5, 6)

output_dim = 3

input_size = num_inputs * np.prod(input_shape)

#print(np.prod(input_shape)) #120

weight_size = output_dim * np.prod(input_shape)

x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)

#print(*input_shape)# 4 5 6

print(np.shape(x))# (2, 4, 5, 6)

w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)

b = np.linspace(-0.3, 0.1, num=output_dim)

out, _ = affine_forward(x, w, b)

correct_out = np.array([[ 1.49834967,  1.70660132,  1.91485297],

                        [ 3.25553199,  3.5141327,   3.77273342]])

# Compare your output with ours. The error should be around e-9 or less.

print('Testing affine_forward function:')

print('difference: ', rel_error(out, correct_out))

其中函数为

def affine_backward(dout, cache):

    x, w, b = cache

    dx, dw, db = None, None, None

    # shape ：x(10*2*3) w(6*5) b(5) out(10*5)

    dx = np.dot(dout, w.T)  # (N, M) dot (D, M).T -> (N, D): (10, 6)

    dx = dx.reshape(x.shape)  # 将 dx 调整为与 x 相同的形状: (10*2*3)

    dw = np.dot(x.reshape(x.shape[0], -1).T, dout)  # (D, N) dot (N, M) -> (D, M) 6*10 dot 10*5 = 6*5

    db = np.sum(dout, axis=0)  # 沿着样本维度求和，得到 (M,) 形状的梯度

    return dx, dw, db

ReLU activation

正向

out = np.maximum(x,0)

反向的话刚开始理解错了，写成了

 dx=np.maximum(dx,0)

显然是错误的，应当判断x是否小于0而不是dx

dx = np.copy(dout)

dx[x <= 0] = 0

即可。

Sandwich layers

看一下layer_utis.py，看起来就是affine和relu封装在了一起，误差也在e-11 e-12这样

Loss layers: SVM & Softmax

svm

def svm_loss(x, y):

    loss, dx = None, None

    num_train = x.shape[0]

    scores = x - np.max(x, axis=1, keepdims=True)

    correct_class_scores = scores[np.arange(num_train), y]

    margins = np.maximum(0, scores - correct_class_scores[:, np.newaxis] + 1)

    margins[np.arange(num_train), y] = 0

    loss = np.sum(margins) / num_train

    num_pos = np.sum(margins > 0, axis=1)

    dx = np.zeros_like(x)

    dx[margins > 0] = 1

    dx[np.arange(num_train), y] -= num_pos

    dx /= num_train

    return loss, dx

造了个3,4的看一下过程

x:

[[ 4.17943411e-04  1.39710028e-03 -1.78590431e-03]

 [-7.08827734e-04 -7.47253161e-05 -7.75016769e-04]

 [-1.49797903e-04  1.86172902e-03 -1.42552930e-03]

 [-3.76356699e-04 -3.42275390e-04  2.94907637e-04]]

y:

[2 1 1 0]

scores:

[[-0.00097916  0.         -0.003183  ]

 [-0.0006341   0.         -0.00070029]

 [-0.00201153  0.         -0.00328726]

 [-0.00067126 -0.00063718  0.        ]]

correct class scores:

[-0.003183    0.          0.         -0.00067126]

margins:

[[1.00220385 1.003183   1.        ]

 [0.9993659  1.         0.99929971]

 [0.99798847 1.         0.99671274]

 [1.         1.00003408 1.00067126]]

margins:

[[1.00220385 1.003183   0.        ]

 [0.9993659  0.         0.99929971]

 [0.99798847 0.         0.99671274]

 [0.         1.00003408 1.00067126]]

num_pos:

[2 2 2 2]

dx：

[[1. 1. 0.]

 [1. 0. 1.]

 [1. 0. 1.]

 [0. 1. 1.]]

dx:

[[ 1.  1. -2.]

 [ 1. -2.  1.]

 [ 1. -2.  1.]

 [-2.  1.  1.]]

softmax

dx去掉了x转置dot（dscore）

def softmax_loss(x, y):

    loss, dx = None, None

*START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    num_train = x.shape[0]

    scores = x - np.max(x, axis=1, keepdims=True)

    exp_scores = np.exp(scores)

    #correct_class_scores = scores[np.arange(num_train), y]

    probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)

    loss = np.sum(-np.log(probs[np.arange(num_train ), y])) / num_train 

    # Compute the gradient

    dscores = probs

    dscores[np.arange(num_train ), y] -= 1

    dscores /= num_train 

    dx = dscores

    return loss, dx

Two-layer network

这一部分就看一下fc_net.py里面TwoLayerNet的类

假设输入维度是D，隐藏（层）维度H，分类数为C

结构是affine - relu - affine - softmax，不包含梯度下降。

模型参数存在字典self.params中

from builtins import range

from builtins import object

import numpy as np

from ..layers import *

from ..layer_utils import *

class TwoLayerNet(object):

    def __init__(

        self,

        input_dim=3 * 32 * 32,

        hidden_dim=100,

        num_classes=10,

        weight_scale=1e-3,

        reg=0.0,

    ):

        self.params = {}

        self.reg = reg

        self.params['W1'] = weight_scale * np.random.randn(input_dim,hidden_dim)

        self.params['b1'] = np.zeros(hidden_dim)

        self.params['W2'] = weight_scale * np.random.randn(hidden_dim, num_classes)

        self.params['b2'] = np.zeros(num_classes)

    def loss(self, X, y=None):

        scores = None

        hidden_layer = np.maximum(0, np.dot(X, self.params['W1']) + self.params['b1'])  # ReLU activation

        scores = np.dot(hidden_layer, self.params['W2']) + self.params['b2']

        # If y is None then we are in test mode so just return scores

        if y is None:

            return scores

        loss, grads = 0, {}

        num_train = X.shape[0]

        scores -= np.max(scores, axis=1, keepdims=True)  # for numerical stability

        softmax_scores = np.exp(scores) / np.sum(np.exp(scores), axis=1, keepdims=True)

        correct_class_scores = softmax_scores[range(num_train), y]

        data_loss = -np.log(correct_class_scores).mean()

        reg_loss = 0.5 * self.reg * (np.sum(self.params['W1'] ** 2) + np.sum(self.params['W2'] ** 2))

        loss = data_loss + reg_loss

        # Backward pass

        dscores = softmax_scores.copy()

        dscores[range(num_train), y] -= 1

        dscores /= num_train

        grads['W2'] = np.dot(hidden_layer.T, dscores) + self.reg * self.params['W2']

        grads['b2'] = np.sum(dscores, axis=0)

        dhidden = np.dot(dscores, self.params['W2'].T)

        dhidden[hidden_layer <= 0] = 0  # backpropagate through ReLU

        grads['W1'] = np.dot(X.T, dhidden) + self.reg * self.params['W1']

        grads['b1'] = np.sum(dhidden, axis=0)

        return loss, grads

CS231N Assigenment1 two_layer_net笔记的更多相关文章

【cs231n】神经网络笔记笔记2
) # 对数据进行零中心化(重要) cov = np.dot(X.T, X) / X.shape[0] # 得到数据的协方差矩阵数据协方差矩阵的第(i, j)个元素是数据第i个和第j个维度的协方差. ...
【cs231n】最优化笔记
): W = np.random.randn(10, 3073) * 0.0001 # generate random parameters loss = L(X_train, Y_train, W) ...
cs231n官方note笔记
本文记录官方note中比较新颖和有价值的观点(从反向传播开始) 一反向传播 1 “反向传播是一个优美的局部过程.在整个计算线路图中,每个门单元都会得到一些输入并立即计算两个东西:1. 这个门的输出值 ...
[基础]斯坦福cs231n课程视频笔记(三) 训练神经网络
目录 training Neural Network Activation function sigmoid ReLU Preprocessing Batch Normalization 权重初始化 ...
CS231n 2017 学习笔记01——KNN（K-Nearest Neighbors）
本博客内容来自 Stanford University CS231N 2017 Lecture 2 - Image Classification 课程官网:http://cs231n.stanford ...
【cs231n】图像分类笔记
前言首先声明,以下内容绝大部分转自知乎智能单元,他们将官方学习笔记进行了很专业的翻译,在此我会直接copy他们翻译的笔记,有些地方会用红字写自己的笔记,本文只是作为自己的学习笔记.本文内容官网链接: ...
[基础]斯坦福cs231n课程视频笔记(二) 神经网络的介绍
目录 Introduction to Neural Networks BP Nerual Network Convolutional Neural Network Introduction to Ne ...
[基础]斯坦福cs231n课程视频笔记(一) 图片分类之使用线性分类器
线性分类器的基本模型: f = Wx Loss Function and Optimization 1. LossFunction 衡量在当前的模型(参数矩阵W)的效果好坏 Multiclass SV ...
cs231n学习笔记——lecture6 Training Neural Networks
该博客主要用于个人学习记录,部分内容参考自:[基础]斯坦福cs231n课程视频笔记(三) 训练神经网络.[cs231n笔记]10.神经网络训练技巧(上).CS231n学习笔记-训练神经网络.整理学习之 ...
『cs231n』绪论
笔记链接 cs231n系列所有图片笔记均拷贝自网络,链接如上,特此声明,后篇不再重复. 计算机视觉历史总结出视觉两个重要结论:1.基础的视觉神经识别的是简单的边缘&轮廓2.视觉是分层的数据 ...

随机推荐

解读Java内存模型中Happens-Before的8个原则
摘要:本文我们就结合案例程序来说明Java内存模型中的Happens-Before原则. 本文分享自华为云社区<[高并发]一文秒懂Happens-Before原则>,作者: 冰河. 在正 ...
你眼中的程序员 VS 程序员眼中的自己，是时候打破代沟了
摘要:修电脑?格子衫?脱发?程序员被误解了怎么办?如何一句话向父母说明白你的工作? 有人说,你们程序员工作赚钱真简单,电脑上按按键盘就行了,一点也不辛苦. 有人说,程序员不懂生活,就知道天天对着电脑. ...
vue2升级vue3：vue3真的需要vuex或者Pinia吗？hooks全有了
在写 <vue2升级vue3:TypeScript下vuex-module-decorators/vuex-class to vuex4.x>,建议新项目使用 Pinia,但是我的项目部分 ...
【flask】蓝图的使用方式 g对象的使用 flask配置数据库连接池
目录上节回顾今日内容 1 蓝图的使用 2 g对象 g对象 vs request对象 3 数据库连接池上节回顾全局request对象.线程会处理请求,确保线程中的数据不错乱. django_se ...
【库函数】在什么时候使用 string_view 代替 string
前言 C++17增加了std::string_view,它在很多情况会优于使用std::string . 尤其是用做函数形参的时候,使用std::string_view基本一定优于老式的const s ...
SpringBoot 项目实战 | 瑞吉外卖 Day06
该系列将记录一份完整的实战项目的完成过程,该篇属于第六天案例来自B站黑马程序员Java项目实战<瑞吉外卖>,请结合课程资料阅读以下内容该篇我们将完成以下内容: 用户地址簿相关功能菜品 ...
Codeforces Round #618 (Div. 2) A~E
原作者为 RioTian@cnblogs, 本作品采用 CC 4.0 BY 进行许可,转载请注明出处. 1300A. Non-zero 题意:给你一个数组,每次操作你可以使其中任意元素的值+1,问最少 ...
关于 VS Code 用户自定义代码片段的官方 $ 命令记录
关于 VS Code 的定义用户代码片段的部分 $ 命令 TM_SELECTED_TEXT:当前选定的文本或空字符串: 注:选定后通过在命令窗口点选「插入代码片段」插入. TM_CURRENT_LIN ...
AtCoder Beginner Contest 189 Personal Editorial
第一次参加 AtCoder 的比赛,感觉还挺简单. 比赛链接:https://atcoder.jp/contests/abc189 A - Slot // Author : RioTian // Ti ...
kafka集群六、java操作kafka(没有密码验证)
系列导航一.kafka搭建-单机版二.kafka搭建-集群搭建三.kafka集群增加密码验证四.kafka集群权限增加ACL 五.kafka集群__consumer_offsets副本数修改 ...

CS231N Assigenment1 two_layer_net笔记