two_layer_net.ipynb

之前对 x.reshape(x.shape[0], -1)语句的输出结果理解一直有误:

1 x = [[1,4,7,2],[2,5,7,4]]
2 x = np.array(x)
3 x0 = x.reshape(x.shape[0], -1)
4 x1 = x.reshape(x.shape[1], -1)
5 print(x0)
6 print(x1)

的输出实际为

[[1 4 7 2]
[2 5 7 4]]
[[1 4]
[7 2]
[2 5]
[7 4]]

Affine layer: forward

# Test the affine_forward function

num_inputs = 2
input_shape = (4, 5, 6)
output_dim = 3 input_size = num_inputs * np.prod(input_shape)
#print(np.prod(input_shape)) #120
weight_size = output_dim * np.prod(input_shape) x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)
#prin(t(*input_shape)# 4 5 6
print(np.shape(x))# (2, 4, 5, 6)
w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)
b = np.linspace(-0.3, 0.1, num=output_dim) out, _ = affine_forward(x, w, b)
correct_out = np.array([[ 1.49834967, 1.70660132, 1.91485297],
[ 3.25553199, 3.5141327, 3.77273342]]) # Compare your output with ours. The error should be around e-9 or less.
print('Testing affine_forward function:')
print('difference: ', rel_error(out, correct_out))

要补充的函数为

def affine_forward(x, w, b):

    out = None

    x_vector = x.reshape(x.shape[0], -1)
out = x_vector.dot(w)
out += b return out, cache

Affine layer: backward

# Test the affine_forward function

num_inputs = 2
input_shape = (4, 5, 6)
output_dim = 3 input_size = num_inputs * np.prod(input_shape)
#print(np.prod(input_shape)) #120
weight_size = output_dim * np.prod(input_shape) x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)
#print(*input_shape)# 4 5 6
print(np.shape(x))# (2, 4, 5, 6)
w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)
b = np.linspace(-0.3, 0.1, num=output_dim) out, _ = affine_forward(x, w, b)
correct_out = np.array([[ 1.49834967, 1.70660132, 1.91485297],
[ 3.25553199, 3.5141327, 3.77273342]]) # Compare your output with ours. The error should be around e-9 or less.
print('Testing affine_forward function:')
print('difference: ', rel_error(out, correct_out))

其中函数为

def affine_backward(dout, cache):

    x, w, b = cache
dx, dw, db = None, None, None # shape :x(10*2*3) w(6*5) b(5) out(10*5)
dx = np.dot(dout, w.T) # (N, M) dot (D, M).T -> (N, D): (10, 6)
dx = dx.reshape(x.shape) # 将 dx 调整为与 x 相同的形状: (10*2*3) dw = np.dot(x.reshape(x.shape[0], -1).T, dout) # (D, N) dot (N, M) -> (D, M) 6*10 dot 10*5 = 6*5 db = np.sum(dout, axis=0) # 沿着样本维度求和,得到 (M,) 形状的梯度 return dx, dw, db

ReLU activation

正向

out = np.maximum(x,0)

反向的话刚开始理解错了,写成了

 dx=np.maximum(dx,0)

显然是错误的,应当判断x是否小于0而不是dx

dx = np.copy(dout)
dx[x <= 0] = 0

即可。

Sandwich layers

看一下layer_utis.py,看起来就是affine和relu封装在了一起,误差也在e-11 e-12这样

Loss layers: SVM & Softmax

svm

def svm_loss(x, y):
loss, dx = None, None num_train = x.shape[0]
scores = x - np.max(x, axis=1, keepdims=True)
correct_class_scores = scores[np.arange(num_train), y]
margins = np.maximum(0, scores - correct_class_scores[:, np.newaxis] + 1)
margins[np.arange(num_train), y] = 0
loss = np.sum(margins) / num_train num_pos = np.sum(margins > 0, axis=1)
dx = np.zeros_like(x)
dx[margins > 0] = 1
dx[np.arange(num_train), y] -= num_pos
dx /= num_train return loss, dx

造了个3,4的看一下过程

x:
[[ 4.17943411e-04 1.39710028e-03 -1.78590431e-03]
[-7.08827734e-04 -7.47253161e-05 -7.75016769e-04]
[-1.49797903e-04 1.86172902e-03 -1.42552930e-03]
[-3.76356699e-04 -3.42275390e-04 2.94907637e-04]] y:
[2 1 1 0] scores:
[[-0.00097916 0. -0.003183 ]
[-0.0006341 0. -0.00070029]
[-0.00201153 0. -0.00328726]
[-0.00067126 -0.00063718 0. ]] correct class scores:
[-0.003183 0. 0. -0.00067126]
margins:
[[1.00220385 1.003183 1. ]
[0.9993659 1. 0.99929971]
[0.99798847 1. 0.99671274]
[1. 1.00003408 1.00067126]] margins:
[[1.00220385 1.003183 0. ]
[0.9993659 0. 0.99929971]
[0.99798847 0. 0.99671274]
[0. 1.00003408 1.00067126]] num_pos:
[2 2 2 2]
dx: [[1. 1. 0.]
[1. 0. 1.]
[1. 0. 1.]
[0. 1. 1.]] dx:
[[ 1. 1. -2.]
[ 1. -2. 1.]
[ 1. -2. 1.]
[-2. 1. 1.]]

softmax

dx去掉了x转置dot(dscore)

def softmax_loss(x, y):

    loss, dx = None, None
*START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***** num_train = x.shape[0]
scores = x - np.max(x, axis=1, keepdims=True)
exp_scores = np.exp(scores)
#correct_class_scores = scores[np.arange(num_train), y] probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True) loss = np.sum(-np.log(probs[np.arange(num_train ), y])) / num_train # Compute the gradient
dscores = probs
dscores[np.arange(num_train ), y] -= 1
dscores /= num_train dx = dscores return loss, dx

Two-layer network

这一部分就看一下fc_net.py里面TwoLayerNet的类

假设输入维度是D,隐藏(层)维度H,分类数为C

结构是affine - relu - affine - softmax,不包含梯度下降。
模型参数存在字典self.params中
from builtins import range
from builtins import object
import numpy as np from ..layers import *
from ..layer_utils import * class TwoLayerNet(object): def __init__(
self,
input_dim=3 * 32 * 32,
hidden_dim=100,
num_classes=10,
weight_scale=1e-3,
reg=0.0,
): self.params = {}
self.reg = reg self.params['W1'] = weight_scale * np.random.randn(input_dim,hidden_dim)
self.params['b1'] = np.zeros(hidden_dim) self.params['W2'] = weight_scale * np.random.randn(hidden_dim, num_classes)
self.params['b2'] = np.zeros(num_classes) def loss(self, X, y=None): scores = None hidden_layer = np.maximum(0, np.dot(X, self.params['W1']) + self.params['b1']) # ReLU activation
scores = np.dot(hidden_layer, self.params['W2']) + self.params['b2'] # If y is None then we are in test mode so just return scores
if y is None:
return scores loss, grads = 0, {} num_train = X.shape[0]
scores -= np.max(scores, axis=1, keepdims=True) # for numerical stability
softmax_scores = np.exp(scores) / np.sum(np.exp(scores), axis=1, keepdims=True)
correct_class_scores = softmax_scores[range(num_train), y]
data_loss = -np.log(correct_class_scores).mean()
reg_loss = 0.5 * self.reg * (np.sum(self.params['W1'] ** 2) + np.sum(self.params['W2'] ** 2))
loss = data_loss + reg_loss # Backward pass
dscores = softmax_scores.copy()
dscores[range(num_train), y] -= 1
dscores /= num_train grads['W2'] = np.dot(hidden_layer.T, dscores) + self.reg * self.params['W2']
grads['b2'] = np.sum(dscores, axis=0) dhidden = np.dot(dscores, self.params['W2'].T)
dhidden[hidden_layer <= 0] = 0 # backpropagate through ReLU grads['W1'] = np.dot(X.T, dhidden) + self.reg * self.params['W1']
grads['b1'] = np.sum(dhidden, axis=0) return loss, grads
 

CS231N Assigenment1 two_layer_net笔记的更多相关文章

  1. 【cs231n】神经网络笔记笔记2

    ) # 对数据进行零中心化(重要) cov = np.dot(X.T, X) / X.shape[0] # 得到数据的协方差矩阵 数据协方差矩阵的第(i, j)个元素是数据第i个和第j个维度的协方差. ...

  2. 【cs231n】最优化笔记

    ): W = np.random.randn(10, 3073) * 0.0001 # generate random parameters loss = L(X_train, Y_train, W) ...

  3. cs231n官方note笔记

    本文记录官方note中比较新颖和有价值的观点(从反向传播开始) 一 反向传播 1 “反向传播是一个优美的局部过程.在整个计算线路图中,每个门单元都会得到一些输入并立即计算两个东西:1. 这个门的输出值 ...

  4. [基础]斯坦福cs231n课程视频笔记(三) 训练神经网络

    目录 training Neural Network Activation function sigmoid ReLU Preprocessing Batch Normalization 权重初始化 ...

  5. CS231n 2017 学习笔记01——KNN(K-Nearest Neighbors)

    本博客内容来自 Stanford University CS231N 2017 Lecture 2 - Image Classification 课程官网:http://cs231n.stanford ...

  6. 【cs231n】图像分类笔记

    前言 首先声明,以下内容绝大部分转自知乎智能单元,他们将官方学习笔记进行了很专业的翻译,在此我会直接copy他们翻译的笔记,有些地方会用红字写自己的笔记,本文只是作为自己的学习笔记.本文内容官网链接: ...

  7. [基础]斯坦福cs231n课程视频笔记(二) 神经网络的介绍

    目录 Introduction to Neural Networks BP Nerual Network Convolutional Neural Network Introduction to Ne ...

  8. [基础]斯坦福cs231n课程视频笔记(一) 图片分类之使用线性分类器

    线性分类器的基本模型: f = Wx Loss Function and Optimization 1. LossFunction 衡量在当前的模型(参数矩阵W)的效果好坏 Multiclass SV ...

  9. cs231n学习笔记——lecture6 Training Neural Networks

    该博客主要用于个人学习记录,部分内容参考自:[基础]斯坦福cs231n课程视频笔记(三) 训练神经网络.[cs231n笔记]10.神经网络训练技巧(上).CS231n学习笔记-训练神经网络.整理学习之 ...

  10. 『cs231n』绪论

    笔记链接 cs231n系列所有图片笔记均拷贝自网络,链接如上,特此声明,后篇不再重复. 计算机视觉历史 总结出视觉两个重要结论:1.基础的视觉神经识别的是简单的边缘&轮廓2.视觉是分层的 数据 ...

随机推荐

  1. Golang代码测试:一点到面用测试驱动开发

    摘要:TDD(Test Driven Development),测试驱动开发.期望局部最优到全局最优,这个是一种非常不错的好习惯. 了解Golang的测试之前,我们先了解一下go语言自带的测试工具. ...

  2. 5步带你掌握工作流Activiti框架的使用

    摘要:本文通过一个工作流Activiti框架的具体使用示例,具体详尽的介绍了工作流Activiti框架的使用方式. 本文分享自华为云社区<一个使用示例,五个操作步骤!从此轻松掌握项目中工作流的开 ...

  3. vue-grid-layout数据可视化图表面板优化过程所遇问题汇总

    对于drag事件不熟悉的,请先阅读:<drag事件详解:html5鼠标拖动排序及resize实现方案分析及实践> 之前老项目grafana面板,如下图所示(GEM添加图表是直接到图表编辑, ...

  4. 企业新道路怎么走?火山引擎AB测试助力决策选择

    更多技术交流.求职机会,欢迎关注字节跳动数据平台微信公众号,回复[1]进入官方交流群   乐刻是一家创立8年的企业,除了消费者熟悉的乐刻健身房可办月卡.24小时营业等,其还有比外界了解更多元的业务.目 ...

  5. Python 批量制作缩略图

    本来想网上下个软件处理下的,给我加了水印,不然就让我升会员,程序员都是薅人家羊毛,哪能被人家薅羊毛 1. 安装组件 (指定国内源,速度快些),带上版本号,最新版本会卡在 XXX(PEP 517) 上. ...

  6. web应用模式 api接口 接口测试工具postman restful规范

    目录 web应用模式 前后端混合开发 流程说明(重要) 前后端分离开发 流程说明(重要) api接口 接口测试工具postman 基本介绍 编码格式 restful规范(重要) 简介 主要内容 练习 ...

  7. Codeforces Round #663 (Div. 2) (A~C题,C题 Good)

    比赛链接:Here 1391A. Suborrays 简单构造题, 把 \(n\) 放最前面,接着补 \(1\) ~ \(n - 1\) 即可 1391B. Fix You \((1,1)\) -&g ...

  8. S3C2440移植linux3.4.2内核之内核裁剪

    上一节S3C2440移植linux3.4.2内核之支持YAFFS文件系统我们修改了内核支持了yaffs2文件系统,这节我们裁剪内核. 目录 为什么要裁剪内核? 首先裁剪内核里无关的CPU/单板文件 裁 ...

  9. 《3D编程模式》写书记录

    本书介绍 本书罗列了我从自己的实战项目中提炼出来的关于3D编程(主要包括"3D引擎/游戏引擎"."编辑器"开发)的各种编程模式 所有的写书记录 <3D编程 ...

  10. STM32CubeMX教程20 SPI - W25Q128驱动

    1.准备材料 开发板(正点原子stm32f407探索者开发板V2.4) STM32CubeMX软件(Version 6.10.0) 野火DAP仿真器 keil µVision5 IDE(MDK-Arm ...