Derivative of the softmax loss function

姜楠 2024-10-09 12:19:25 原文

Back-propagation in a nerual network with a Softmax classifier, which uses the Softmax function:
\[\hat y_i=\frac{\exp(o_i)}{\sum_j \exp(o_j)}\]

This is used in a loss function of the form:
\[\mathcal{L}=-\sum_j{y_j\log \hat y_j}\]
where \(o\) is a vector. we need the derivate of \(\mathcal{L}\) with respect to \(o\).

Derivative of the softmax function

if \(i=j\),
\[\frac{\partial \hat y_j}{\partial o_i}=\frac{\exp(o_i)\times \sum_i \exp(o_i) - \exp(o_i)\exp(o_i)}{(\sum_i \exp(o_i))^2}=\hat y_i(1-\hat y_i)\]
if \(i\ne j\),

\[\frac{\partial \hat y_j}{\partial o_i}=\frac{0 - \exp(o_i)\exp(o_j)}{(\sum_i \exp(o_i))^2}=-\hat y_i \hat y_j\]

These two part can be conveniently combined using a construct called Kronecker Delta, so the definition of the gradient becomes,

\[\frac{\partial \hat y_j}{\partial o_i}=\hat y_i(\delta_{ij}-\hat y_i)\]

where the Kronecker delta \(\delta_{ij}\) is defined as:
\[\delta_{ij} = \begin{cases}
0 &\text{if } i \neq j, \\
1 &\text{if } i=j. \end{cases}\]

Derivative of Cross-entropy cost function

\[\begin{split}\frac{\partial L}{\partial o_i}&=-\sum_k y_k\frac{\partial \log \hat y_k}{\partial o_i}=-\sum_k y_k\frac{1}{\hat y_k}\frac{\partial \hat y_k}{\partial o_i}\\
&=-y_i(1-\hat y_i)-\sum_{k\neq i}y_k\frac{1}{\hat y_k}(-\hat y_k \hat y_i)\\
&=-y_i(1-\hat y_i)+\sum_{k\neq i}y_k \hat y_i\\
&=-y_i +y_i\hat y_i+ \hat y_i\sum_{k\ne i}{y_k}\\
&=\hat y_i\sum_k{y_k}-y_i\\
&=\hat y_i-y_i\end{split}\]

given that \(\sum_ky_k=1\)(as \(y\) is a vector with only one non-zero element, which is \(1\)).

finally, we get,
\[\frac{\partial \mathcal{L}}{\partial o_i} = \hat y_i - y_i\]

Derivative of the softmax loss function的更多相关文章

Derivative of Softmax Loss Function
Derivative of Softmax Loss Function A softmax classifier: \[ p_j = \frac{\exp{o_j}}{\sum_{k}\exp{o_k ...
loss function
什么是loss? loss: loss是我们用来对模型满意程度的指标.loss设计的原则是:模型越好loss越低,模型越差loss越高,但也有过拟合的情况. loss function: 在分 ...
损失函数 hinge loss vs softmax loss
1. 损失函数损失函数(Loss function)是用来估量你模型的预测值 f(x) 与真实值 Y 的不一致程度,它是一个非负实值函数,通常用 L(Y,f(x)) 来表示. 损失函数越小,模型的鲁 ...
【深度学习】一文读懂机器学习常用损失函数（Loss Function）
最近太忙已经好久没有写博客了,今天整理分享一篇关于损失函数的文章吧,以前对损失函数的理解不够深入,没有真正理解每个损失函数的特点以及应用范围,如果文中有任何错误,请各位朋友指教,谢谢~ 损失函数(lo ...
(Review cs231n)loss function and optimization
分类器需要在识别物体变化时候具有很好的鲁棒性(robus) 线性分类器(linear classifier)理解为模板的匹配,根据数量,表达能力不足,泛化性低:理解为将图片看做在高维度区域线性分类器 ...
机器学习中的损失函数（着重比较：hinge loss vs softmax loss）
https://blog.csdn.net/u010976453/article/details/78488279 1. 损失函数损失函数(Loss function)是用来估量你模型的预测值 f( ...
基于Caffe的Large Margin Softmax Loss的实现（中）
小喵的唠叨话:前一篇博客,我们做完了L-Softmax的准备工作.而这一章,我们开始进行前馈的研究. 小喵博客: http://miaoerduo.com 博客原文: http://www.miao ...
基于Caffe的Large Margin Softmax Loss的实现（上）
小喵的唠叨话:在写完上一次的博客之后,已经过去了2个月的时间,小喵在此期间,做了大量的实验工作,最终在使用的DeepID2的方法之后,取得了很不错的结果.这次呢,主要讲述一个比较新的论文中的方法,L- ...
loss function与cost function
实际上,代价函数(cost function)和损失函数(loss function 亦称为 error function)是同义的.它们都是事先定义一个假设函数(hypothesis),通过训练集由 ...

随机推荐

ORA-00911: invalid character --- 字符集的问题
网上搜了一遍, 大多数是因为分号( ; ) 的问题. 而我的sql文件是没有分号的, 最后发现是sql文件编码和服务器字符集的差异造成 sql文件怎么都看不出问题,直到在UltraEdit里切换到1 ...
Asp.Net MVC+BootStrap+EF6.0实现简单的用户角色权限管理10
今天把用户的菜单显示和页面的按钮显示都做好了,下面先来个效果图接下来说下我实现的方法: 首先我在每个方法前面都加了这个属性, /// <summary> /// 表示当前Action请求 ...
shell 1到指定数累加
#!/bin/bash read -p "输入尾数:" a expr $(seq -s " + " $a) #seq命令可以指定生成一个数到另一个数之间的所有整 ...
gpuimage的各种滤镜简介
#import"GLProgram.h" //Baseclasses #import"GPUImageOpenGLESContext.h" #import&qu ...
Virtualbox虚机无法启动因断电
The virtual machine 'nn1' has terminated unexpectedly during startup with exit code 1 (0x1). More ...
scala Basic 第三课
yield 在学习c#的时候学习过这个关键字,和这时的语义是一致的. 当你生成一个新的迭代器,而并不是想立刻使用,而是在其他地方使用的时候,可以延迟生成这个集合, 这时候yield关键字可以帮你完成这 ...
linux 、 jmeter部署安装
1.安装&配置可在Linux服务器上利用服务器强大的性能,执行JMeter进行性能测试. 当然,可在Windows机器上先编好测试计划(注意版本匹配,否则可能产生莫名错误),然后下载到Lin ...
PHP开发环境搭建
链接: Q&A1.Mac下的PHP环境搭建 Mac 下如何搭建 PHP 开发环境? [PHP] Mac下homebrew安装及php.mysql.nginx环境安装及配置个人PHP开发环境的选 ...
DayOfWeek
int a =(int)oneDate.DayOfWeek; 返回的直接就是 1,2,3,4,5,6, 星期日返回的是0
CSS中不为人知Zoom属性的使用介绍(IE私有属性)
其实Zoom属性是IE浏览器的专有属性,Firefox等浏览器不支持.它可以设置或检索对象的缩放比例.除此之外,它还有其他一些小作用,比如触发ie的hasLayout属性,清除浮动.清除margin的 ...