CS231N Assigenment1 two_layer_net笔记
two_layer_net.ipynb
之前对 x.reshape(x.shape[0], -1)语句的输出结果理解一直有误:
1 x = [[1,4,7,2],[2,5,7,4]]
2 x = np.array(x)
3 x0 = x.reshape(x.shape[0], -1)
4 x1 = x.reshape(x.shape[1], -1)
5 print(x0)
6 print(x1)
的输出实际为
[[1 4 7 2]
[2 5 7 4]]
[[1 4]
[7 2]
[2 5]
[7 4]]
Affine layer: forward
# Test the affine_forward function num_inputs = 2
input_shape = (4, 5, 6)
output_dim = 3 input_size = num_inputs * np.prod(input_shape)
#print(np.prod(input_shape)) #120
weight_size = output_dim * np.prod(input_shape) x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)
#prin(t(*input_shape)# 4 5 6
print(np.shape(x))# (2, 4, 5, 6)
w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)
b = np.linspace(-0.3, 0.1, num=output_dim) out, _ = affine_forward(x, w, b)
correct_out = np.array([[ 1.49834967, 1.70660132, 1.91485297],
[ 3.25553199, 3.5141327, 3.77273342]]) # Compare your output with ours. The error should be around e-9 or less.
print('Testing affine_forward function:')
print('difference: ', rel_error(out, correct_out))
要补充的函数为
def affine_forward(x, w, b): out = None x_vector = x.reshape(x.shape[0], -1)
out = x_vector.dot(w)
out += b return out, cache
Affine layer: backward
# Test the affine_forward function num_inputs = 2
input_shape = (4, 5, 6)
output_dim = 3 input_size = num_inputs * np.prod(input_shape)
#print(np.prod(input_shape)) #120
weight_size = output_dim * np.prod(input_shape) x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)
#print(*input_shape)# 4 5 6
print(np.shape(x))# (2, 4, 5, 6)
w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)
b = np.linspace(-0.3, 0.1, num=output_dim) out, _ = affine_forward(x, w, b)
correct_out = np.array([[ 1.49834967, 1.70660132, 1.91485297],
[ 3.25553199, 3.5141327, 3.77273342]]) # Compare your output with ours. The error should be around e-9 or less.
print('Testing affine_forward function:')
print('difference: ', rel_error(out, correct_out))
其中函数为
def affine_backward(dout, cache): x, w, b = cache
dx, dw, db = None, None, None # shape :x(10*2*3) w(6*5) b(5) out(10*5)
dx = np.dot(dout, w.T) # (N, M) dot (D, M).T -> (N, D): (10, 6)
dx = dx.reshape(x.shape) # 将 dx 调整为与 x 相同的形状: (10*2*3) dw = np.dot(x.reshape(x.shape[0], -1).T, dout) # (D, N) dot (N, M) -> (D, M) 6*10 dot 10*5 = 6*5 db = np.sum(dout, axis=0) # 沿着样本维度求和,得到 (M,) 形状的梯度 return dx, dw, db
ReLU activation
正向
out = np.maximum(x,0)
反向的话刚开始理解错了,写成了
dx=np.maximum(dx,0)
显然是错误的,应当判断x是否小于0而不是dx
dx = np.copy(dout)
dx[x <= 0] = 0
即可。
Sandwich layers
看一下layer_utis.py,看起来就是affine和relu封装在了一起,误差也在e-11 e-12这样
Loss layers: SVM & Softmax
svm
def svm_loss(x, y):
loss, dx = None, None num_train = x.shape[0]
scores = x - np.max(x, axis=1, keepdims=True)
correct_class_scores = scores[np.arange(num_train), y]
margins = np.maximum(0, scores - correct_class_scores[:, np.newaxis] + 1)
margins[np.arange(num_train), y] = 0
loss = np.sum(margins) / num_train num_pos = np.sum(margins > 0, axis=1)
dx = np.zeros_like(x)
dx[margins > 0] = 1
dx[np.arange(num_train), y] -= num_pos
dx /= num_train return loss, dx
造了个3,4的看一下过程
x:
[[ 4.17943411e-04 1.39710028e-03 -1.78590431e-03]
[-7.08827734e-04 -7.47253161e-05 -7.75016769e-04]
[-1.49797903e-04 1.86172902e-03 -1.42552930e-03]
[-3.76356699e-04 -3.42275390e-04 2.94907637e-04]] y:
[2 1 1 0] scores:
[[-0.00097916 0. -0.003183 ]
[-0.0006341 0. -0.00070029]
[-0.00201153 0. -0.00328726]
[-0.00067126 -0.00063718 0. ]] correct class scores:
[-0.003183 0. 0. -0.00067126]
margins:
[[1.00220385 1.003183 1. ]
[0.9993659 1. 0.99929971]
[0.99798847 1. 0.99671274]
[1. 1.00003408 1.00067126]] margins:
[[1.00220385 1.003183 0. ]
[0.9993659 0. 0.99929971]
[0.99798847 0. 0.99671274]
[0. 1.00003408 1.00067126]] num_pos:
[2 2 2 2]
dx: [[1. 1. 0.]
[1. 0. 1.]
[1. 0. 1.]
[0. 1. 1.]] dx:
[[ 1. 1. -2.]
[ 1. -2. 1.]
[ 1. -2. 1.]
[-2. 1. 1.]]
softmax
dx去掉了x转置dot(dscore)
def softmax_loss(x, y): loss, dx = None, None
*START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***** num_train = x.shape[0]
scores = x - np.max(x, axis=1, keepdims=True)
exp_scores = np.exp(scores)
#correct_class_scores = scores[np.arange(num_train), y] probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True) loss = np.sum(-np.log(probs[np.arange(num_train ), y])) / num_train # Compute the gradient
dscores = probs
dscores[np.arange(num_train ), y] -= 1
dscores /= num_train dx = dscores return loss, dx
Two-layer network
这一部分就看一下fc_net.py里面TwoLayerNet的类
假设输入维度是D,隐藏(层)维度H,分类数为C
from builtins import range
from builtins import object
import numpy as np from ..layers import *
from ..layer_utils import * class TwoLayerNet(object): def __init__(
self,
input_dim=3 * 32 * 32,
hidden_dim=100,
num_classes=10,
weight_scale=1e-3,
reg=0.0,
): self.params = {}
self.reg = reg self.params['W1'] = weight_scale * np.random.randn(input_dim,hidden_dim)
self.params['b1'] = np.zeros(hidden_dim) self.params['W2'] = weight_scale * np.random.randn(hidden_dim, num_classes)
self.params['b2'] = np.zeros(num_classes) def loss(self, X, y=None): scores = None hidden_layer = np.maximum(0, np.dot(X, self.params['W1']) + self.params['b1']) # ReLU activation
scores = np.dot(hidden_layer, self.params['W2']) + self.params['b2'] # If y is None then we are in test mode so just return scores
if y is None:
return scores loss, grads = 0, {} num_train = X.shape[0]
scores -= np.max(scores, axis=1, keepdims=True) # for numerical stability
softmax_scores = np.exp(scores) / np.sum(np.exp(scores), axis=1, keepdims=True)
correct_class_scores = softmax_scores[range(num_train), y]
data_loss = -np.log(correct_class_scores).mean()
reg_loss = 0.5 * self.reg * (np.sum(self.params['W1'] ** 2) + np.sum(self.params['W2'] ** 2))
loss = data_loss + reg_loss # Backward pass
dscores = softmax_scores.copy()
dscores[range(num_train), y] -= 1
dscores /= num_train grads['W2'] = np.dot(hidden_layer.T, dscores) + self.reg * self.params['W2']
grads['b2'] = np.sum(dscores, axis=0) dhidden = np.dot(dscores, self.params['W2'].T)
dhidden[hidden_layer <= 0] = 0 # backpropagate through ReLU grads['W1'] = np.dot(X.T, dhidden) + self.reg * self.params['W1']
grads['b1'] = np.sum(dhidden, axis=0) return loss, grads
CS231N Assigenment1 two_layer_net笔记的更多相关文章
- 【cs231n】神经网络笔记笔记2
) # 对数据进行零中心化(重要) cov = np.dot(X.T, X) / X.shape[0] # 得到数据的协方差矩阵 数据协方差矩阵的第(i, j)个元素是数据第i个和第j个维度的协方差. ...
- 【cs231n】最优化笔记
): W = np.random.randn(10, 3073) * 0.0001 # generate random parameters loss = L(X_train, Y_train, W) ...
- cs231n官方note笔记
本文记录官方note中比较新颖和有价值的观点(从反向传播开始) 一 反向传播 1 “反向传播是一个优美的局部过程.在整个计算线路图中,每个门单元都会得到一些输入并立即计算两个东西:1. 这个门的输出值 ...
- [基础]斯坦福cs231n课程视频笔记(三) 训练神经网络
目录 training Neural Network Activation function sigmoid ReLU Preprocessing Batch Normalization 权重初始化 ...
- CS231n 2017 学习笔记01——KNN(K-Nearest Neighbors)
本博客内容来自 Stanford University CS231N 2017 Lecture 2 - Image Classification 课程官网:http://cs231n.stanford ...
- 【cs231n】图像分类笔记
前言 首先声明,以下内容绝大部分转自知乎智能单元,他们将官方学习笔记进行了很专业的翻译,在此我会直接copy他们翻译的笔记,有些地方会用红字写自己的笔记,本文只是作为自己的学习笔记.本文内容官网链接: ...
- [基础]斯坦福cs231n课程视频笔记(二) 神经网络的介绍
目录 Introduction to Neural Networks BP Nerual Network Convolutional Neural Network Introduction to Ne ...
- [基础]斯坦福cs231n课程视频笔记(一) 图片分类之使用线性分类器
线性分类器的基本模型: f = Wx Loss Function and Optimization 1. LossFunction 衡量在当前的模型(参数矩阵W)的效果好坏 Multiclass SV ...
- cs231n学习笔记——lecture6 Training Neural Networks
该博客主要用于个人学习记录,部分内容参考自:[基础]斯坦福cs231n课程视频笔记(三) 训练神经网络.[cs231n笔记]10.神经网络训练技巧(上).CS231n学习笔记-训练神经网络.整理学习之 ...
- 『cs231n』绪论
笔记链接 cs231n系列所有图片笔记均拷贝自网络,链接如上,特此声明,后篇不再重复. 计算机视觉历史 总结出视觉两个重要结论:1.基础的视觉神经识别的是简单的边缘&轮廓2.视觉是分层的 数据 ...
随机推荐
- Golang代码测试:一点到面用测试驱动开发
摘要:TDD(Test Driven Development),测试驱动开发.期望局部最优到全局最优,这个是一种非常不错的好习惯. 了解Golang的测试之前,我们先了解一下go语言自带的测试工具. ...
- 5步带你掌握工作流Activiti框架的使用
摘要:本文通过一个工作流Activiti框架的具体使用示例,具体详尽的介绍了工作流Activiti框架的使用方式. 本文分享自华为云社区<一个使用示例,五个操作步骤!从此轻松掌握项目中工作流的开 ...
- vue-grid-layout数据可视化图表面板优化过程所遇问题汇总
对于drag事件不熟悉的,请先阅读:<drag事件详解:html5鼠标拖动排序及resize实现方案分析及实践> 之前老项目grafana面板,如下图所示(GEM添加图表是直接到图表编辑, ...
- 企业新道路怎么走?火山引擎AB测试助力决策选择
更多技术交流.求职机会,欢迎关注字节跳动数据平台微信公众号,回复[1]进入官方交流群 乐刻是一家创立8年的企业,除了消费者熟悉的乐刻健身房可办月卡.24小时营业等,其还有比外界了解更多元的业务.目 ...
- Python 批量制作缩略图
本来想网上下个软件处理下的,给我加了水印,不然就让我升会员,程序员都是薅人家羊毛,哪能被人家薅羊毛 1. 安装组件 (指定国内源,速度快些),带上版本号,最新版本会卡在 XXX(PEP 517) 上. ...
- web应用模式 api接口 接口测试工具postman restful规范
目录 web应用模式 前后端混合开发 流程说明(重要) 前后端分离开发 流程说明(重要) api接口 接口测试工具postman 基本介绍 编码格式 restful规范(重要) 简介 主要内容 练习 ...
- Codeforces Round #663 (Div. 2) (A~C题,C题 Good)
比赛链接:Here 1391A. Suborrays 简单构造题, 把 \(n\) 放最前面,接着补 \(1\) ~ \(n - 1\) 即可 1391B. Fix You \((1,1)\) -&g ...
- S3C2440移植linux3.4.2内核之内核裁剪
上一节S3C2440移植linux3.4.2内核之支持YAFFS文件系统我们修改了内核支持了yaffs2文件系统,这节我们裁剪内核. 目录 为什么要裁剪内核? 首先裁剪内核里无关的CPU/单板文件 裁 ...
- 《3D编程模式》写书记录
本书介绍 本书罗列了我从自己的实战项目中提炼出来的关于3D编程(主要包括"3D引擎/游戏引擎"."编辑器"开发)的各种编程模式 所有的写书记录 <3D编程 ...
- STM32CubeMX教程20 SPI - W25Q128驱动
1.准备材料 开发板(正点原子stm32f407探索者开发板V2.4) STM32CubeMX软件(Version 6.10.0) 野火DAP仿真器 keil µVision5 IDE(MDK-Arm ...