[CVPR2015] Is object localization for free? – Weakly-supervised learning with convolutional neural networks论文笔记
p.p1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px "Helvetica Neue"; color: #323333 }
p.p2 { margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px "Helvetica Neue"; color: #042eee }
span.s1 { }
span.s2 { text-decoration: underline }
Is object localization for free? –Weakly-supervised learning with convolutional neural networks. Maxime Oquab, Leon Bottou, Ivan Laptev, Josef Sivic
http://www.di.ens.fr/~josef/publications/Oquab15.pdf
p.p1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 15.0px "Helvetica Neue"; color: #323333 }
p.p2 { margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px "Helvetica Neue"; color: #323333 }
li.li2 { margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px "Helvetica Neue"; color: #323333 }
span.s1 { }
span.s2 { background-color: #fefa00 }
ul.ul1 { list-style-type: disc }
ul.ul2 { list-style-type: circle }
亮点
- 一个好名字给了让读者开始阅读的理由
- global max pooling over sliding window的定位方法值得借鉴
方法
本文的目标是:设计一个弱监督分类网络,注意本文的目标主要是提升分类。因为是2015年的文章,方法比较简单原始。
Following three modifications to a classification network.
- Treat the fully connected layers as convolutions, which allows us to deal with nearly arbitrary-sized images as input.
- The aim is to apply the network to bigger images in a sliding window manner thus extending its output to n×m× K, where n and m denote the number of sliding window positions in the x- and y- direction in the image, respectively.
- 3xhxw —> convs —> kxmxn (k: number of classes)
- Explicitly search for the highest scoring object position in the image by adding a single global max-pooling layer at the output.
- kxmxn —> kx1x1
- The max-pooling operation hypothesizes the location of the object in the image at the position with the maximum score
- Use a cost function that can explicitly model multiple objects present in the image.
因为图中可能有很多物体,所以多类的分类loss不适用。作者把这个任务视为多个二分类问题,loss function和分类的分数如下
p.p1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px "Helvetica Neue"; color: #323333 }
p.p2 { margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px "Helvetica Neue"; color: #323333; min-height: 15.0px }
p.p3 { margin: 0.0px 0.0px 0.0px 0.0px; font: 15.0px "Helvetica Neue"; color: #323333 }
li.li1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px "Helvetica Neue"; color: #323333 }
span.s1 { }
ul.ul1 { list-style-type: disc }

training

muti-scale test

实验
classification
- mAP on VOC 2012 test: +3.1% compared with [56]
- mAP on VOC 2012 test: +7.6% compared with kx1x1 output and single scale training
- mAP on VOC: +2.6% compared with RCNN
- mAP on COCO 62.8%
Localisation
- Metric: if the maximal response across scales falls within the ground truth bounding box of an object of the same class within 18 pixels tolerance, we label the predicted location as correct. If not, then we count the response as a false positive (it hit the background), and we also increment the false negative count (no object was found).
- metric on VOC 2012 val: -0.3% compared with RCNN
- mAP on COCO 41.2%
缺点
- 定位评测的metric不具有权威性
- max pooling改为average pooling会不会对于多个instance的情况更好一些
[CVPR2015] Is object localization for free? – Weakly-supervised learning with convolutional neural networks论文笔记的更多相关文章
- Coursera, Deep Learning 4, Convolutional Neural Networks, week3, Object detection
学习目标 Understand the challenges of Object Localization, Object Detection and Landmark Finding Underst ...
- 论文笔记之:Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking
Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking arXiv Paper ...
- tensorfolw配置过程中遇到的一些问题及其解决过程的记录(配置SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving)
今天看到一篇关于检测的论文<SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real- ...
- [CVPR2017] Weakly Supervised Cascaded Convolutional Networks论文笔记
p.p1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px "Helvetica Neue"; color: #042eee } p. ...
- A brief introduction to weakly supervised learning(简要介绍弱监督学习)
by 南大周志华 摘要 监督学习技术通过学习大量训练数据来构建预测模型,其中每个训练样本都有其对应的真值输出.尽管现有的技术已经取得了巨大的成功,但值得注意的是,由于数据标注过程的高成本,很多任务很难 ...
- [CVPR 2016] Weakly Supervised Deep Detection Networks论文笔记
p.p1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px "Helvetica Neue"; color: #323333 } p. ...
- 课程四(Convolutional Neural Networks),第三 周(Object detection) —— 0.Learning Goals
Learning Goals: Understand the challenges of Object Localization, Object Detection and Landmark Find ...
- [C4W3] Convolutional Neural Networks - Object detection
第三周 目标检测(Object detection) 目标定位(Object localization) 大家好,欢迎回来,这一周我们学习的主要内容是对象检测,它是计算机视觉领域中一个新兴的应用方向, ...
- 论文笔记(7):Constrained Convolutional Neural Networks for Weakly Supervised Segmentation
UC Berkeley的Deepak Pathak 使用了一个具有图像级别标记的训练数据来做弱监督学习.训练数据中只给出图像中包含某种物体,但是没有其位置信息和所包含的像素信息.该文章的方法将imag ...
随机推荐
- 开源视频平台:Kaltura
Kaltura是一个很优秀的开源视频平台.提供了视频的管理系统,视频的在线编辑系统等等一整套完整的系统,功能甚是强大. Kaltura不同于其他诸如Brightcove,Ooyala这样的网络视频平台 ...
- 《java入门第一季》之面向对象综合小案例
需求: /* 教练和运动员案例 乒乓球运动员和篮球运动员. 乒乓球教练和篮球教练. 跟乒乓球相关的人员都需要学习英语. 分析,这 ...
- ThreadLocal深入理解 修订版
本文是传智博客多线程视频的学习笔记. 原版本见 http://blog.csdn.net/dlf123321/article/details/42531979 ThreadLocal是一个和线程安全相 ...
- 数据结构是哈希表(hashTable)
哈希表也称为散列表,是根据关键字值(key value)而直接进行访问的数据结构.也就是说,它通过把关键字值映射到一个位置来访问记录,以加快查找的速度.这个映射函数称为哈希函数(也称为散列函数),映射 ...
- dex分包方案
当一个app的功能越来越复杂,代码量越来越多,也许有一天便会突然遇到下列现象: 1. 生成的apk在2.3以前的机器无法安装,提示INSTALL_FAILED_DEXOPT 2. 方法数量过多,编译时 ...
- hbase thrift 定义
/* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agre ...
- 【53】java的多线程同步剖析
synchronized关键字介绍: synchronized锁定的是对象,这个很重要 例子: class Sync { public synchronized void test() { Syste ...
- 基于友善之臂ARM-ContexA9-ADC驱动开发
ADC,就是模数转换器,什么是模数转换器? 模数转换器,在电子技术中即是将模拟信号转换成数字信号,也称为数字量化. 当然还有一种叫DAC,就是数模转换,意思相反,即是将数字信号转换成模拟信号. 在友善 ...
- PS 色调——颜色运算
通过对三个通道定义不同的运算,使图像的色调改变,进而生成不同色彩的图像. clc; clear all; Image=imread('4.jpg'); Image=double(Image); R=I ...
- 3 sum closest
Given an array S of n integers, find three integers in S such that the sum is closest to a given num ...