[CVPR 2016] Weakly Supervised Deep Detection Networks论文笔记
p.p1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px "Helvetica Neue"; color: #323333 }
p.p2 { margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px "Helvetica Neue"; color: #042eee }
p.p3 { margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px "Helvetica Neue"; color: #323333; min-height: 15.0px }
p.p4 { margin: 0.0px 0.0px 0.0px 0.0px; font: 15.0px "Helvetica Neue"; color: #323333 }
li.li1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px "Helvetica Neue"; color: #323333 }
span.s1 { }
span.s2 { text-decoration: underline }
ul.ul1 { list-style-type: disc }
ul.ul2 { list-style-type: circle }
Weakly Supervised Deep Detection Networks,Hakan Bilen,Andrea Vedaldi
亮点
- 把弱监督检测问题解释为proposal排序的问题,通过比较所有proposal的类别分数得到一个比较正确的排序,这种思想与检测中评测标准的计算方法一致

相关工作
The MIL strategy results in a non-convex optimization problem; in practice, solvers tend to get stuck in local optima
such that the quality of the solution strongly depends on the initialization.
- developing various initialization strategies [19, 5, 32, 4]
- [19] propose a self-paced learning strategy
- [5] initialize object locations based on the objectness score.
- [4] propose a multi-fold split of the training data to escape local optima.
- on regularizing the optimization problem [31, 1].
- [31] apply Nesterov’s smoothing technique to the latent SVM formulation
- [1] propose a smoothed version of MIL that softly labels object instances instead of choosing the highest scoring ones.
- Another line of research in WSD is based on the idea of identifying the similarity between image parts.
- [31] propose a discriminative graph-based algorithm that selects a subset of windows such that each window is connected to its nearest neighbors in positive images.
- [32] extend this method to discover multiple co-occurring part configurations.
- [36] propose an iterative technique that applies a latent semantic clustering via latent Semantic Analysis (pLSA)
- [2] propose a formulation that jointly learns a discriminative model and enforces the similarity of the selected object regions via a discriminative convex clustering algorithm
方法
本文采用的方法非常简单易懂,主要分为以下三部:
- 将特征和region proposal的结果输入spatial pyramid pooling层,取出与区域相关的特征向量,并输入两个fc层
- 分类:fc层的输出通过softmax分类器,计算出这一区域类别
- 检测:fc层的输出通过softmax分类器,与上面不同的是归一化的时候不是用类别归一化,而是用所有区域的分数进行归一化,通过区域之间的对比找到包含该类别信息最多的区域
- 某区域r属于某类别c的得分,为后两部分的积
- 全图的类别得分,为所有区域属于该类别的得分之和

训练的loss function如下

p.p1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px "Helvetica Neue"; color: #323333 }
p.p2 { margin: 0.0px 0.0px 0.0px 0.0px; font: 15.0px "Helvetica Neue"; color: #323333 }
li.li1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px "Helvetica Neue"; color: #323333 }
span.s1 { }
ul.ul1 { list-style-type: disc }
ul.ul2 { list-style-type: circle }
ul.ul3 { list-style-type: square }
最后一项是一个校准项(按照理解轻微更改了,感觉论文notation有点问题),其目的是通过拉近feature的距离约束解的平滑性(即与正确解相近的proposal也应该得到高分)。
实验结果
本文根据basenet不同给出了4种model:S (VGG-F), M (VGG-M-1024), L (VGG-VD16)和Ens(前三种ensemble的模型)
- Ablation:
- Object proposal
- Baseline mAP: Selective Search S 31.1%, M 30.9%, L 24.3%, Ens. 33.3%
- Edge Box: +0~1.2%
- Edge Box + Edge Box Score: +1.8~5.9%
- Spatial regulariser (compared with Edge Box + Edge Box Score) mAP +1.2~4.4%
- VOC2007
- mAP on test: S +2.9%, M +3.3%, L +3.2%, Ens. +7.7% compared with [36] + context
- CorLoc on trainval: S +5.7%, M +7.6%, L +5%, Ens. +9.5% compared with [36]
- Classification AP on test: S +7.9% compared with VGG-F, M +6.5% compared with VGG-M-1024, L +0.4% compared with VGG-VD16, Ens. -0.3% compared with VGG-VD16
- VOC2010
- mAP on test: +8.8% compared with [4]
- CorLoc on trainval: +4.5% compared with [4]
缺点
本文有一个明显的缺点是只考虑了一张图中某类别物体只出现一次的情况(regulariser中仅限制了最大值及其周围的框),这一点在文中给出的failure cases中也有所体现。
[CVPR 2016] Weakly Supervised Deep Detection Networks论文笔记的更多相关文章
- [CVPR2017] Weakly Supervised Cascaded Convolutional Networks论文笔记
p.p1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px "Helvetica Neue"; color: #042eee } p. ...
- [ICCV 2019] Weakly Supervised Object Detection With Segmentation Collaboration
新在ICCV上发的弱监督物体检测文章,偷偷高兴一下,贴出我的poster,最近有点忙,话不多说,欢迎交流- https://arxiv.org/pdf/1904.00551.pdf http://op ...
- Video Frame Synthesis using Deep Voxel Flow 论文笔记
Video Frame Synthesis using Deep Voxel Flow 论文笔记 arXiv 摘要:本文解决了模拟新的视频帧的问题,要么是现有视频帧之间的插值,要么是紧跟着他们的探索. ...
- 吴恩达《深度学习》-第一门课 (Neural Networks and Deep Learning)-第四周:深层神经网络(Deep Neural Networks)-课程笔记
第四周:深层神经网络(Deep Neural Networks) 4.1 深层神经网络(Deep L-layer neural network) 有一些函数,只有非常深的神经网络能学会,而更浅的模型则 ...
- CVPR 2018paper: DeepDefense: Training Deep Neural Networks with Improved Robustness第一讲
前言:好久不见了,最近一直瞎忙活,博客好久都没有更新了,表示道歉.希望大家在新的一年中工作顺利,学业进步,共勉! 今天我们介绍深度神经网络的缺点:无论模型有多深,无论是卷积还是RNN,都有的问题:以图 ...
- [paper reading] C-MIL: Continuation Multiple Instance Learning for Weakly Supervised Object Detection CVPR2019
MIL陷入局部最优,检测到局部,无法完整的检测到物体.将instance划分为空间相关和类别相关的子集.在这些子集中定义一系列平滑的损失近似代替原损失函数,优化这些平滑损失. C-MIL learns ...
- StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks 论文笔记
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks 本文将利 ...
- [CVPR2015] Is object localization for free? – Weakly-supervised learning with convolutional neural networks论文笔记
p.p1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px "Helvetica Neue"; color: #323333 } p. ...
- 【医学影像】《Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning》论文笔记
这篇论文的作者是张康教授为首的团队,联合国内外众多医院及科研机构,合力完成,最后发表在cell上,实至名归. 从方法的角度上来说,与上一篇博客中的论文很相似,采用的都是InceptionV3模型,同时 ...
随机推荐
- 完美滑动顶部固定ScrollView,合并可以上拉,下拉加载更多
先看效果: 主要处理: 使用PullToRefreshScrollView修改内部的scrollView源码,更换成可以固定顶部的自定义scrollView @Override protected S ...
- Ext.Net_1.X_WINDOW遮罩层被GridPanel挡住
通过调试HTML代码,发现其实是DIV. chrome 中修改DIV Z:INDEX 就不被遮住了?但是又晓得如何修改window的Z:INDEX.那就修改"背景"GP的吧.
- Android群英传帝落篇——程序人生,路漫漫其修远兮,吾将上下而求索!
Android群英传帝落篇--程序人生,路漫漫其修远兮,吾将上下而求索! 当写这篇博客的时候,自2016-02-22到现在5.2号,一晃眼,也㓟两个多月就过去了,我才将这本书看完,虽然写笔记花了很大的 ...
- Android实现RecyclerView侧滑删除和长按拖拽-ItemTouchHelper
RecyclerView这个被誉为ListView和GirdView的替代品,它的用法在之前的一篇博文中就已经讲过了,今天我们就来实现RecyclerView的侧滑删除和长按拖拽功能,实现这两个功能我 ...
- gtk+程序在关闭主窗口时的事件流
当鼠标单击gtk+窗口的关闭按钮时,程序首先接收到delete_event,当该事件处理函数返回TRUE表示事件已处理禁止进一步传播,从而取消关闭操作:当返回FALSE时,事件消息进一步向上传播,此时 ...
- android 自定义相机
老规矩,先上一下项目地址:GitHub:https://github.com/xiangzhihong/CameraDemo 方式: 调用Camera API 自定义相机 调用系统相机 由于需求不同, ...
- 面试之路(27)-链表中倒数第K个结点
代码的鲁棒性: 所谓的鲁棒性是指能够判断输入是否合乎规范,能对不和规范的程序进行处理. 容错性是鲁棒性的一个重要体现. 防御性编程有助于提高鲁棒性. 切入正题,我可不是标题党: 链表倒数第k个节点 列 ...
- LeetCode(56)-Add Binary
题目: Given two binary strings, return their sum (also a binary string). For example, a = "11&quo ...
- 如何检测被锁住的Oracle存储过程
今天遇到了这个情况,然后在网上找了到了这篇文章,借鉴过来做参考吧! 1.查看是哪一个存储过程被锁住 查V$DB_OBJECT_CACHE视图 select * from V$DB_OBJECT_CAC ...
- Fragment生命周期与Fragment执行hide、show后的生命周期探讨
一.Fragment 生命周期中的每个方法的意义与作用: 1.setUserVisibleHint()(此方法不属于生命周期方法):设置Fragment 用户可见或不可见时调用此方法,此方法在Frag ...