CS231N Assigenment1 two_layer

two_layer_net.ipynb

之前对 x.reshape(x.shape[0], -1)语句的输出结果理解一直有误：

1 x = [[1,4,7,2],[2,5,7,4]]

2 x = np.array(x)

3 x0 = x.reshape(x.shape[0], -1)

4 x1 = x.reshape(x.shape[1], -1)

5 print(x0)

6 print(x1)

的输出实际为

[[1 4 7 2]

 [2 5 7 4]]

[[1 4]

 [7 2]

 [2 5]

 [7 4]]

Affine layer: forward

# Test the affine_forward function

num_inputs = 2

input_shape = (4, 5, 6)

output_dim = 3

input_size = num_inputs * np.prod(input_shape)

#print(np.prod(input_shape)) #120

weight_size = output_dim * np.prod(input_shape)

x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)

#prin(t(*input_shape)# 4 5 6

print(np.shape(x))# (2, 4, 5, 6)

w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)

b = np.linspace(-0.3, 0.1, num=output_dim)

out, _ = affine_forward(x, w, b)

correct_out = np.array([[ 1.49834967,  1.70660132,  1.91485297],

                        [ 3.25553199,  3.5141327,   3.77273342]])

# Compare your output with ours. The error should be around e-9 or less.

print('Testing affine_forward function:')

print('difference: ', rel_error(out, correct_out))

要补充的函数为

def affine_forward(x, w, b):

    out = None

    x_vector = x.reshape(x.shape[0], -1)

    out = x_vector.dot(w)

    out += b

    return out, cache

Affine layer: backward

# Test the affine_forward function

num_inputs = 2

input_shape = (4, 5, 6)

output_dim = 3

input_size = num_inputs * np.prod(input_shape)

#print(np.prod(input_shape)) #120

weight_size = output_dim * np.prod(input_shape)

x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)

#print(*input_shape)# 4 5 6

print(np.shape(x))# (2, 4, 5, 6)

w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)

b = np.linspace(-0.3, 0.1, num=output_dim)

out, _ = affine_forward(x, w, b)

correct_out = np.array([[ 1.49834967,  1.70660132,  1.91485297],

                        [ 3.25553199,  3.5141327,   3.77273342]])

# Compare your output with ours. The error should be around e-9 or less.

print('Testing affine_forward function:')

print('difference: ', rel_error(out, correct_out))

其中函数为

def affine_backward(dout, cache):

    x, w, b = cache

    dx, dw, db = None, None, None

    # shape ：x(10*2*3) w(6*5) b(5) out(10*5)

    dx = np.dot(dout, w.T)  # (N, M) dot (D, M).T -> (N, D): (10, 6)

    dx = dx.reshape(x.shape)  # 将 dx 调整为与 x 相同的形状: (10*2*3)

    dw = np.dot(x.reshape(x.shape[0], -1).T, dout)  # (D, N) dot (N, M) -> (D, M) 6*10 dot 10*5 = 6*5

    db = np.sum(dout, axis=0)  # 沿着样本维度求和，得到 (M,) 形状的梯度

    return dx, dw, db

ReLU activation

正向

out = np.maximum(x,0)

反向的话刚开始理解错了，写成了

 dx=np.maximum(dx,0)

显然是错误的，应当判断x是否小于0而不是dx

dx = np.copy(dout)

dx[x <= 0] = 0

即可。

Sandwich layers

看一下layer_utis.py，看起来就是affine和relu封装在了一起，误差也在e-11 e-12这样

Loss layers: SVM & Softmax

svm

def svm_loss(x, y):

    loss, dx = None, None

    num_train = x.shape[0]

    scores = x - np.max(x, axis=1, keepdims=True)

    correct_class_scores = scores[np.arange(num_train), y]

    margins = np.maximum(0, scores - correct_class_scores[:, np.newaxis] + 1)

    margins[np.arange(num_train), y] = 0

    loss = np.sum(margins) / num_train

    num_pos = np.sum(margins > 0, axis=1)

    dx = np.zeros_like(x)

    dx[margins > 0] = 1

    dx[np.arange(num_train), y] -= num_pos

    dx /= num_train

    return loss, dx

造了个3,4的看一下过程

x:

[[ 4.17943411e-04  1.39710028e-03 -1.78590431e-03]

 [-7.08827734e-04 -7.47253161e-05 -7.75016769e-04]

 [-1.49797903e-04  1.86172902e-03 -1.42552930e-03]

 [-3.76356699e-04 -3.42275390e-04  2.94907637e-04]]

y:

[2 1 1 0]

scores:

[[-0.00097916  0.         -0.003183  ]

 [-0.0006341   0.         -0.00070029]

 [-0.00201153  0.         -0.00328726]

 [-0.00067126 -0.00063718  0.        ]]

correct class scores:

[-0.003183    0.          0.         -0.00067126]

margins:

[[1.00220385 1.003183   1.        ]

 [0.9993659  1.         0.99929971]

 [0.99798847 1.         0.99671274]

 [1.         1.00003408 1.00067126]]

margins:

[[1.00220385 1.003183   0.        ]

 [0.9993659  0.         0.99929971]

 [0.99798847 0.         0.99671274]

 [0.         1.00003408 1.00067126]]

num_pos:

[2 2 2 2]

dx：

[[1. 1. 0.]

 [1. 0. 1.]

 [1. 0. 1.]

 [0. 1. 1.]]

dx:

[[ 1.  1. -2.]

 [ 1. -2.  1.]

 [ 1. -2.  1.]

 [-2.  1.  1.]]

softmax

dx去掉了x转置dot（dscore）

def softmax_loss(x, y):

    loss, dx = None, None

*START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    num_train = x.shape[0]

    scores = x - np.max(x, axis=1, keepdims=True)

    exp_scores = np.exp(scores)

    #correct_class_scores = scores[np.arange(num_train), y]

    probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)

    loss = np.sum(-np.log(probs[np.arange(num_train ), y])) / num_train 

    # Compute the gradient

    dscores = probs

    dscores[np.arange(num_train ), y] -= 1

    dscores /= num_train 

    dx = dscores

    return loss, dx

Two-layer network

这一部分就看一下fc_net.py里面TwoLayerNet的类

假设输入维度是D，隐藏（层）维度H，分类数为C

结构是affine - relu - affine - softmax，不包含梯度下降。

模型参数存在字典self.params中

from builtins import range

from builtins import object

import numpy as np

from ..layers import *

from ..layer_utils import *

class TwoLayerNet(object):

    def __init__(

        self,

        input_dim=3 * 32 * 32,

        hidden_dim=100,

        num_classes=10,

        weight_scale=1e-3,

        reg=0.0,

    ):

        self.params = {}

        self.reg = reg

        self.params['W1'] = weight_scale * np.random.randn(input_dim,hidden_dim)

        self.params['b1'] = np.zeros(hidden_dim)

        self.params['W2'] = weight_scale * np.random.randn(hidden_dim, num_classes)

        self.params['b2'] = np.zeros(num_classes)

    def loss(self, X, y=None):

        scores = None

        hidden_layer = np.maximum(0, np.dot(X, self.params['W1']) + self.params['b1'])  # ReLU activation

        scores = np.dot(hidden_layer, self.params['W2']) + self.params['b2']

        # If y is None then we are in test mode so just return scores

        if y is None:

            return scores

        loss, grads = 0, {}

        num_train = X.shape[0]

        scores -= np.max(scores, axis=1, keepdims=True)  # for numerical stability

        softmax_scores = np.exp(scores) / np.sum(np.exp(scores), axis=1, keepdims=True)

        correct_class_scores = softmax_scores[range(num_train), y]

        data_loss = -np.log(correct_class_scores).mean()

        reg_loss = 0.5 * self.reg * (np.sum(self.params['W1'] ** 2) + np.sum(self.params['W2'] ** 2))

        loss = data_loss + reg_loss

        # Backward pass

        dscores = softmax_scores.copy()

        dscores[range(num_train), y] -= 1

        dscores /= num_train

        grads['W2'] = np.dot(hidden_layer.T, dscores) + self.reg * self.params['W2']

        grads['b2'] = np.sum(dscores, axis=0)

        dhidden = np.dot(dscores, self.params['W2'].T)

        dhidden[hidden_layer <= 0] = 0  # backpropagate through ReLU

        grads['W1'] = np.dot(X.T, dhidden) + self.reg * self.params['W1']

        grads['b1'] = np.sum(dhidden, axis=0)

        return loss, grads

CS231N Assigenment1 two_layer_net笔记的更多相关文章

【cs231n】神经网络笔记笔记2
) # 对数据进行零中心化(重要) cov = np.dot(X.T, X) / X.shape[0] # 得到数据的协方差矩阵数据协方差矩阵的第(i, j)个元素是数据第i个和第j个维度的协方差. ...
【cs231n】最优化笔记
): W = np.random.randn(10, 3073) * 0.0001 # generate random parameters loss = L(X_train, Y_train, W) ...
cs231n官方note笔记
本文记录官方note中比较新颖和有价值的观点(从反向传播开始) 一反向传播 1 “反向传播是一个优美的局部过程.在整个计算线路图中,每个门单元都会得到一些输入并立即计算两个东西:1. 这个门的输出值 ...
[基础]斯坦福cs231n课程视频笔记(三) 训练神经网络
目录 training Neural Network Activation function sigmoid ReLU Preprocessing Batch Normalization 权重初始化 ...
CS231n 2017 学习笔记01——KNN（K-Nearest Neighbors）
本博客内容来自 Stanford University CS231N 2017 Lecture 2 - Image Classification 课程官网:http://cs231n.stanford ...
【cs231n】图像分类笔记
前言首先声明,以下内容绝大部分转自知乎智能单元,他们将官方学习笔记进行了很专业的翻译,在此我会直接copy他们翻译的笔记,有些地方会用红字写自己的笔记,本文只是作为自己的学习笔记.本文内容官网链接: ...
[基础]斯坦福cs231n课程视频笔记(二) 神经网络的介绍
目录 Introduction to Neural Networks BP Nerual Network Convolutional Neural Network Introduction to Ne ...
[基础]斯坦福cs231n课程视频笔记(一) 图片分类之使用线性分类器
线性分类器的基本模型: f = Wx Loss Function and Optimization 1. LossFunction 衡量在当前的模型(参数矩阵W)的效果好坏 Multiclass SV ...
cs231n学习笔记——lecture6 Training Neural Networks
该博客主要用于个人学习记录,部分内容参考自:[基础]斯坦福cs231n课程视频笔记(三) 训练神经网络.[cs231n笔记]10.神经网络训练技巧(上).CS231n学习笔记-训练神经网络.整理学习之 ...
『cs231n』绪论
笔记链接 cs231n系列所有图片笔记均拷贝自网络,链接如上,特此声明,后篇不再重复. 计算机视觉历史总结出视觉两个重要结论:1.基础的视觉神经识别的是简单的边缘&轮廓2.视觉是分层的数据 ...

随机推荐

openGauss内核分析：查询重写
摘要:查询重写优化既可以基于关系代数的理论进行优化,也可以基于启发式规则进行优化. 本文分享自华为云社区<openGauss内核分析(四):查询重写>,作者:酷哥. 查询重写 SQL语言是 ...
📝 App备案与iOS云管理式证书，公钥及证书SHA-1指纹的获取方法
引言在iOS应用程序开发过程中,进行App备案并获取公钥及证书SHA-1指纹是至关重要的步骤.本文将介绍如何通过appuploader工具获取iOS云管理式证书 Distribution Man ...
人人都会Kubernetes（一）：告别手写K8s yaml，运维效率提升500%
1. Kubernetes的普及和重要性随着云计算的迅速发展,容器化技术已成为构建和运行分布式应用程序的关键.而Kubernetes作为容器编排领域的佼佼者,已经成为了云原生应用的标准.它不仅简化了 ...
如何使用Java在Excel中添加动态数组公式？
本文由葡萄城技术团队发布.转载请注明出处:葡萄城官网,葡萄城为开发者提供专业的开发工具.解决方案和服务,赋能开发者. 前言动态数组公式是 Excel 引入的一项重要功能,它将 Excel 分为两种风 ...
让 ChatGPT 如虎添翼 2.0
月初写了4个工具,让 ChatGPT 如虎添翼!,时隔二十多天,我又发现了基于ChatGPT的应用,只推最好用的,强烈建议大家都感受一下. 极简搭建 ChatGPT 演示网页项目地址:https:/ ...
百年奥运的凌空之美，AI 云智剪背后的新算法
奥运赛事每天都在上演冰雪奇迹,而捕捉发生瞬间,凝结最精彩.最动人的体育人文画面,让"冰之舞"."雪之舞"."速度之美"."凌空之美 ...
HomeBrew 安装 Maven 及其 IDEA 配置
Brew 安装 Maven brew search maven # 使用搜索工具去搜索maven包 brew info maven #使用info查看maven包当前的信息情况,包括版本依赖描述等 b ...
AtCoder Beginner Contest 171 AK！
这一场好神奇!能AK了 AB水题, C - One Quadrillion and One Dalmatians 把一个数字转化为字母,规则为 \([1,26]\) 对应 \([a,z]\) , 27 ...
Windows 10 修改环境变量不重启生效的解决方案
# 表示 Win 键 #R 进入 cmd(不要使用 Listary 的 cmd 方式进入,设置不会生效) 键入 set PATH=C: 关闭窗口同样方式打开 cmd 键入 echo %PATH%,会 ...
【真送礼物】1 分钟 Serverless 极速部署盲盒平台，自己部署自己抽！
当前,Serverless 在移动应用.游戏等场景已经实现规模化应用,Serverless 技术可以更好的帮助开发者只关注应用创新,减少对开发与运维的过度关注. 为了让更多开发者在真实场景中体验 Se ...

CS231N Assigenment1 two_layer_net笔记