two_layer_net.ipynb

之前对 x.reshape(x.shape[0], -1)语句的输出结果理解一直有误:

1 x = [[1,4,7,2],[2,5,7,4]]
2 x = np.array(x)
3 x0 = x.reshape(x.shape[0], -1)
4 x1 = x.reshape(x.shape[1], -1)
5 print(x0)
6 print(x1)

的输出实际为

[[1 4 7 2]
[2 5 7 4]]
[[1 4]
[7 2]
[2 5]
[7 4]]

Affine layer: forward

# Test the affine_forward function

num_inputs = 2
input_shape = (4, 5, 6)
output_dim = 3 input_size = num_inputs * np.prod(input_shape)
#print(np.prod(input_shape)) #120
weight_size = output_dim * np.prod(input_shape) x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)
#prin(t(*input_shape)# 4 5 6
print(np.shape(x))# (2, 4, 5, 6)
w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)
b = np.linspace(-0.3, 0.1, num=output_dim) out, _ = affine_forward(x, w, b)
correct_out = np.array([[ 1.49834967, 1.70660132, 1.91485297],
[ 3.25553199, 3.5141327, 3.77273342]]) # Compare your output with ours. The error should be around e-9 or less.
print('Testing affine_forward function:')
print('difference: ', rel_error(out, correct_out))

要补充的函数为

def affine_forward(x, w, b):

    out = None

    x_vector = x.reshape(x.shape[0], -1)
out = x_vector.dot(w)
out += b return out, cache

Affine layer: backward

# Test the affine_forward function

num_inputs = 2
input_shape = (4, 5, 6)
output_dim = 3 input_size = num_inputs * np.prod(input_shape)
#print(np.prod(input_shape)) #120
weight_size = output_dim * np.prod(input_shape) x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)
#print(*input_shape)# 4 5 6
print(np.shape(x))# (2, 4, 5, 6)
w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)
b = np.linspace(-0.3, 0.1, num=output_dim) out, _ = affine_forward(x, w, b)
correct_out = np.array([[ 1.49834967, 1.70660132, 1.91485297],
[ 3.25553199, 3.5141327, 3.77273342]]) # Compare your output with ours. The error should be around e-9 or less.
print('Testing affine_forward function:')
print('difference: ', rel_error(out, correct_out))

其中函数为

def affine_backward(dout, cache):

    x, w, b = cache
dx, dw, db = None, None, None # shape :x(10*2*3) w(6*5) b(5) out(10*5)
dx = np.dot(dout, w.T) # (N, M) dot (D, M).T -> (N, D): (10, 6)
dx = dx.reshape(x.shape) # 将 dx 调整为与 x 相同的形状: (10*2*3) dw = np.dot(x.reshape(x.shape[0], -1).T, dout) # (D, N) dot (N, M) -> (D, M) 6*10 dot 10*5 = 6*5 db = np.sum(dout, axis=0) # 沿着样本维度求和,得到 (M,) 形状的梯度 return dx, dw, db

ReLU activation

正向

out = np.maximum(x,0)

反向的话刚开始理解错了,写成了

 dx=np.maximum(dx,0)

显然是错误的,应当判断x是否小于0而不是dx

dx = np.copy(dout)
dx[x <= 0] = 0

即可。

Sandwich layers

看一下layer_utis.py,看起来就是affine和relu封装在了一起,误差也在e-11 e-12这样

Loss layers: SVM & Softmax

svm

def svm_loss(x, y):
loss, dx = None, None num_train = x.shape[0]
scores = x - np.max(x, axis=1, keepdims=True)
correct_class_scores = scores[np.arange(num_train), y]
margins = np.maximum(0, scores - correct_class_scores[:, np.newaxis] + 1)
margins[np.arange(num_train), y] = 0
loss = np.sum(margins) / num_train num_pos = np.sum(margins > 0, axis=1)
dx = np.zeros_like(x)
dx[margins > 0] = 1
dx[np.arange(num_train), y] -= num_pos
dx /= num_train return loss, dx

造了个3,4的看一下过程

x:
[[ 4.17943411e-04 1.39710028e-03 -1.78590431e-03]
[-7.08827734e-04 -7.47253161e-05 -7.75016769e-04]
[-1.49797903e-04 1.86172902e-03 -1.42552930e-03]
[-3.76356699e-04 -3.42275390e-04 2.94907637e-04]] y:
[2 1 1 0] scores:
[[-0.00097916 0. -0.003183 ]
[-0.0006341 0. -0.00070029]
[-0.00201153 0. -0.00328726]
[-0.00067126 -0.00063718 0. ]] correct class scores:
[-0.003183 0. 0. -0.00067126]
margins:
[[1.00220385 1.003183 1. ]
[0.9993659 1. 0.99929971]
[0.99798847 1. 0.99671274]
[1. 1.00003408 1.00067126]] margins:
[[1.00220385 1.003183 0. ]
[0.9993659 0. 0.99929971]
[0.99798847 0. 0.99671274]
[0. 1.00003408 1.00067126]] num_pos:
[2 2 2 2]
dx: [[1. 1. 0.]
[1. 0. 1.]
[1. 0. 1.]
[0. 1. 1.]] dx:
[[ 1. 1. -2.]
[ 1. -2. 1.]
[ 1. -2. 1.]
[-2. 1. 1.]]

softmax

dx去掉了x转置dot(dscore)

def softmax_loss(x, y):

    loss, dx = None, None
*START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***** num_train = x.shape[0]
scores = x - np.max(x, axis=1, keepdims=True)
exp_scores = np.exp(scores)
#correct_class_scores = scores[np.arange(num_train), y] probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True) loss = np.sum(-np.log(probs[np.arange(num_train ), y])) / num_train # Compute the gradient
dscores = probs
dscores[np.arange(num_train ), y] -= 1
dscores /= num_train dx = dscores return loss, dx

Two-layer network

这一部分就看一下fc_net.py里面TwoLayerNet的类

假设输入维度是D,隐藏(层)维度H,分类数为C

结构是affine - relu - affine - softmax,不包含梯度下降。
模型参数存在字典self.params中
from builtins import range
from builtins import object
import numpy as np from ..layers import *
from ..layer_utils import * class TwoLayerNet(object): def __init__(
self,
input_dim=3 * 32 * 32,
hidden_dim=100,
num_classes=10,
weight_scale=1e-3,
reg=0.0,
): self.params = {}
self.reg = reg self.params['W1'] = weight_scale * np.random.randn(input_dim,hidden_dim)
self.params['b1'] = np.zeros(hidden_dim) self.params['W2'] = weight_scale * np.random.randn(hidden_dim, num_classes)
self.params['b2'] = np.zeros(num_classes) def loss(self, X, y=None): scores = None hidden_layer = np.maximum(0, np.dot(X, self.params['W1']) + self.params['b1']) # ReLU activation
scores = np.dot(hidden_layer, self.params['W2']) + self.params['b2'] # If y is None then we are in test mode so just return scores
if y is None:
return scores loss, grads = 0, {} num_train = X.shape[0]
scores -= np.max(scores, axis=1, keepdims=True) # for numerical stability
softmax_scores = np.exp(scores) / np.sum(np.exp(scores), axis=1, keepdims=True)
correct_class_scores = softmax_scores[range(num_train), y]
data_loss = -np.log(correct_class_scores).mean()
reg_loss = 0.5 * self.reg * (np.sum(self.params['W1'] ** 2) + np.sum(self.params['W2'] ** 2))
loss = data_loss + reg_loss # Backward pass
dscores = softmax_scores.copy()
dscores[range(num_train), y] -= 1
dscores /= num_train grads['W2'] = np.dot(hidden_layer.T, dscores) + self.reg * self.params['W2']
grads['b2'] = np.sum(dscores, axis=0) dhidden = np.dot(dscores, self.params['W2'].T)
dhidden[hidden_layer <= 0] = 0 # backpropagate through ReLU grads['W1'] = np.dot(X.T, dhidden) + self.reg * self.params['W1']
grads['b1'] = np.sum(dhidden, axis=0) return loss, grads
 

CS231N Assigenment1 two_layer_net笔记的更多相关文章

  1. 【cs231n】神经网络笔记笔记2

    ) # 对数据进行零中心化(重要) cov = np.dot(X.T, X) / X.shape[0] # 得到数据的协方差矩阵 数据协方差矩阵的第(i, j)个元素是数据第i个和第j个维度的协方差. ...

  2. 【cs231n】最优化笔记

    ): W = np.random.randn(10, 3073) * 0.0001 # generate random parameters loss = L(X_train, Y_train, W) ...

  3. cs231n官方note笔记

    本文记录官方note中比较新颖和有价值的观点(从反向传播开始) 一 反向传播 1 “反向传播是一个优美的局部过程.在整个计算线路图中,每个门单元都会得到一些输入并立即计算两个东西:1. 这个门的输出值 ...

  4. [基础]斯坦福cs231n课程视频笔记(三) 训练神经网络

    目录 training Neural Network Activation function sigmoid ReLU Preprocessing Batch Normalization 权重初始化 ...

  5. CS231n 2017 学习笔记01——KNN(K-Nearest Neighbors)

    本博客内容来自 Stanford University CS231N 2017 Lecture 2 - Image Classification 课程官网:http://cs231n.stanford ...

  6. 【cs231n】图像分类笔记

    前言 首先声明,以下内容绝大部分转自知乎智能单元,他们将官方学习笔记进行了很专业的翻译,在此我会直接copy他们翻译的笔记,有些地方会用红字写自己的笔记,本文只是作为自己的学习笔记.本文内容官网链接: ...

  7. [基础]斯坦福cs231n课程视频笔记(二) 神经网络的介绍

    目录 Introduction to Neural Networks BP Nerual Network Convolutional Neural Network Introduction to Ne ...

  8. [基础]斯坦福cs231n课程视频笔记(一) 图片分类之使用线性分类器

    线性分类器的基本模型: f = Wx Loss Function and Optimization 1. LossFunction 衡量在当前的模型(参数矩阵W)的效果好坏 Multiclass SV ...

  9. cs231n学习笔记——lecture6 Training Neural Networks

    该博客主要用于个人学习记录,部分内容参考自:[基础]斯坦福cs231n课程视频笔记(三) 训练神经网络.[cs231n笔记]10.神经网络训练技巧(上).CS231n学习笔记-训练神经网络.整理学习之 ...

  10. 『cs231n』绪论

    笔记链接 cs231n系列所有图片笔记均拷贝自网络,链接如上,特此声明,后篇不再重复. 计算机视觉历史 总结出视觉两个重要结论:1.基础的视觉神经识别的是简单的边缘&轮廓2.视觉是分层的 数据 ...

随机推荐

  1. 消除数据孤岛,华为云DRS让一汽红旗ERP系统数据活起来

    摘要:拒绝延时,华为云DRS实现一汽红旗ERP系统数据实时同步. 本文分享自华为云社区<消除数据孤岛,华为云DRS让一汽红旗ERP系统数据活起来>,原文作者:心机胖. 数字化时代,数据成为 ...

  2. 数据飞轮拆解车企数据驱动三板斧:数据分析、市场画像、A/B 实验

    更多技术交流.求职机会,欢迎关注字节跳动数据平台微信公众号,回复[1]进入官方交流群 近日,火山引擎数智平台(VeDI)2023 数据飞轮汽车行业研讨会在上海举办,活动聚焦汽车行业数字化转型痛点,从字 ...

  3. webpack 从 0 到 1 构建 vue

    前端领域框架百花齐放,各种优秀框架出现(react,Vue,ag)等等框架.为了方便开发者快速开发, 开发对应的 cli 脚手架,来提高产出.然而初中级的前端工程师对项目里的 webpack 封装和配 ...

  4. 遇到 Request header is too large,你是如何解决的?

    看到群里有小伙伴问,这个异常要怎么解决: java.lang.IllegalArgumentException: Request header is too large 异常原因 根据Exceptio ...

  5. AtCoder Beginner Contest 199 游记(AB水题,C字符串操作,D搜索,E状压)

    A - Square Inequality 水题 B - Intersection 水题,就是找公共区间,维护一下 Lmax,Rmin即可 void solve() { int n, a, b; in ...

  6. 传统与现代可视化 PK:再生水厂二维工艺组态系统

    前言 随着可视化技术的进步与发展,传统再生水厂组态系统所展示的组态页面已逐渐无法满足当前现阶段多样化的展示手段.使得系统对污泥处理处置及生产运行成本方面的监控.分析方面较为薄弱,急需对信息化应用成果和 ...

  7. C#设计模式17——责任链模式的写法

    是什么: 责任链模式是一种行为型设计模式,它允许对象组成一个链并依次检查另一个对象是否可以处理请求.如果一个对象可以处理请求,它处理请求,并且负责将请求传递给下一个对象,直到请求被处理为止. 为什么: ...

  8. Java之利用openCsv将csv文件导入mysql数据库

    前两天干活儿的时候有个需求,前台导入csv文件,后台要做接收处理,mysql数据库中,项目用的springboot+Vue+mybatisPlus实现,下面详细记录一下实现流程. 1.Controll ...

  9. APB Slave设计

    APB Slave位置 实现通过CPU对于APB Slave读写模块进行读写操作 规格说明 不支持反压,即它反馈给APB的pready信号始终为1 不支持错误传输,就是说他反馈给APB总线的PSLVE ...

  10. [转帖]FIO使用说明

    FIO介绍: FIO是测试IOPS的非常好的工具,用来对磁盘进行压力测试和验证.磁盘IO是检查磁盘性能的重要指标,可以按照负载情况分成照顺序读写,随机读写两大类.FIO是一个可以产生很多线程或进程并执 ...