(转) ICCV 2015:21篇最火爆研究论文

| ICCV 2015:21篇最火爆研究论文 | 
ICCV 2015: Twenty one hottest research papers
“Geometry vs Recognition” becomes ConvNet-for-X
Computer Vision used to be cleanly separated into two schools: geometry and recognition. Geometric methods like structure from motion and optical flow usually focus on measuring objective real-world quantities like 3D “real-world” distances directly from images and recognition techniques like support vector machines and probabilistic graphical models traditionally focus on perceiving high-level semantic information (i.e., is this a dog or a table) directly from images.
The world of computer vision is changing fast has changed. We now have powerful convolutional neural networks that are able to extract just about anything directly from images. So if your input is an image (or set of images), then there’s probably a ConvNet for your problem. While you do need a large labeled dataset, believe me when I say that collecting a large dataset is much easier than manually tweaking knobs inside your 100K-line codebase. As we’re about to see, the separation between geometric methods and learning-based methods is no longer easily discernible.
By 2016 just about everybody in the computer vision community will have tasted the power of ConvNets, so let’s take a look at some of the hottest new research directions in computer vision.
ICCV 2015’s Twenty One Hottest Research Papers

This December in Santiago, Chile, the International Conference of Computer Vision 2015 is going to bring together the world’s leading researchers in Computer Vision, Machine Learning, and Computer Graphics.
To no surprise, this year’s ICCV is filled with lots of ConvNets, but this time the applications of these Deep Learning tools are being applied to much much more creative tasks. Let’s take a look at the following twenty one ICCV 2015 research papers, which will hopefully give you a taste of where the field is going.
1. Ask Your Neurons: A Neural-Based Approach to Answering Questions About Images Mateusz Malinowski, Marcus Rohrbach, Mario Fritz

“We propose a novel approach based on recurrent neural networks for the challenging task of answering of questions about images. It combines a CNN with a LSTM into an end-to-end architecture that predict answers conditioning on a question and an image.”
2. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler

“To align movies and books we exploit a neural sentence embedding that is trained in an unsupervised way from a large corpus of books, as well as a video-text neural embedding for computing similarities between movie clips and sentences in the book.”
3. Learning to See by Moving Pulkit Agrawal, Joao Carreira, Jitendra Malik
“We show that using the same number of training images, features learnt using egomotion as supervision compare favourably to features learnt using class-label as supervision on the tasks of scene recognition, object recognition, visual odometry and keypoint matching.”
4. Local Convolutional Features With Unsupervised Training for Image RetrievalMattis Paulin, Matthijs Douze, Zaid Harchaoui, Julien Mairal, Florent Perronin, Cordelia Schmid

“We introduce a deep convolutional architecture that yields patch-level descriptors, as an alternative to the popular SIFT descriptor for image retrieval.”
5. Deep Networks for Image Super-Resolution With Sparse Prior Zhaowen Wang, Ding Liu, Jianchao Yang, Wei Han, Thomas Huang

“We show that a sparse coding model particularly designed for super-resolution can be incarnated as a neural network, and trained in a cascaded structure from end to end.”
6. High-for-Low and Low-for-High: Efficient Boundary Detection From Deep Object Features and its Applications to High-Level Vision Gedas Bertasius, Jianbo Shi, Lorenzo Torresani

“In this work we show how to predict boundaries by exploiting object level features from a pretrained object-classification network.”
7. ADeep Visual Correspondence Embedding Model for Stereo Matching Costs Zhuoyuan Chen, Xun Sun, Liang Wang, Yinan Yu, Chang Huang

“A novel deep visual correspondence embedding model is trained via Convolutional Neural Network on a large set of stereo images with ground truth disparities. This deep embedding model leverages appearance data to learn visual similarity relationships between corresponding image patches, and explicitly maps intensity values into an embedding feature space to measure pixel dissimilarities.”
8. Im2Calories: Towards an Automated Mobile Vision Food Diary Austin Meyers, Nick Johnston, Vivek Rathod, Anoop Korattikara, Alex Gorban, Nathan Silberman, Sergio Guadarrama, George Papandreou, Jonathan Huang, Kevin P. Murphy

“We present a system which can recognize the contents of your meal from a single image, and then predict its nutritional contents, such as calories.”
9. Unsupervised Visual Representation Learning by Context Prediction Carl Doersch, Abhinav Gupta, Alexei A. Efros

“How can one write an objective function to encourage a representation to capture, for example, objects, if none of the objects are labeled?”
10. Deep Neural Decision Forests Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, Samuel Rota Bulò

“We introduce a stochastic and differentiable decision tree model, which steers the representation learning usually conducted in the initial layers of a (deep) convolutional network.”
11. Conditional Random Fields as Recurrent Neural Networks Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, Philip H. S. Torr

“We formulate mean-field approximate inference for the Conditional Random Fields with Gaussian pairwise potentials as Recurrent Neural Networks.”
12. Flowing ConvNets for Human Pose Estimation in Videos Tomas Pfister, James Charles, Andrew Zisserman

“We investigate a ConvNet architecture that is able to benefit from temporal context by combining information across the multiple frames using optical flow.”
13. Dense Optical Flow Prediction From a Static Image Jacob Walker, Abhinav Gupta, Martial Hebert

“Given a static image, P-CNN predicts the future motion of each and every pixel in the image in terms of optical flow. Our P-CNN model leverages the data in tens of thousands of realistic videos to train our model. Our method relies on absolutely no human labeling and is able to predict motion based on the context of the scene.”
14. DeepBox: Learning Objectness With Convolutional Networks Weicheng Kuo, Bharath Hariharan, Jitendra Malik

“Our framework, which we call DeepBox, uses convolutional neural networks (CNNs) to rerank proposals from a bottom-up method.”
15. Active Object Localization With Deep Reinforcement Learning Juan C. Caicedo, Svetlana Lazebnik

“This agent learns to deform a bounding box using simple transformation actions, with the goal of determining the most specific location of target objects following top-down reasoning.”
16. Predicting Depth, Surface Normals and Semantic Labels With a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus

“We address three different computer vision tasks using a single multiscale convolutional network architecture: depth prediction, surface normal estimation, and semantic labeling.”
17. HD-CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition Zhicheng Yan, Hao Zhang, Robinson Piramuthu, Vignesh Jagadeesh, Dennis DeCoste, Wei Di, Yizhou Yu

“We introduce hierarchical deep CNNs (HD-CNNs) by embedding deep CNNs into a category hierarchy. An HD-CNN separates easy classes using a coarse category classifier while distinguishing difficult classes using fine category classifiers.”
18. FlowNet: Learning Optical Flow With Convolutional NetworksAlexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Häusser, Caner Hazırbaş, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox

“We construct appropriate CNNs which are capable of solving the optical flow estimation problem as a supervised learning task.”
19. Understanding Deep Features With Computer-Generated Imagery Mathieu Aubry, Bryan C. Russell

“Rendered images are presented to a trained CNN and responses for different layers are studied with respect to the input scene factors.”
20. PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization Alex Kendall, Matthew Grimes, Roberto Cipolla

“Our system trains a convolutional neural network to regress the 6-DOF camera pose from a single RGB image in an end-to-end manner with no need of additional engineering or graph optimisation.”
21. Visual Tracking With Fully Convolutional Networks Lijun Wang, Wanli Ouyang, Xiaogang Wang, Huchuan Lu

“A new approach for general object tracking with fully convolutional neural network.”
Conclusion
While some can argue that the great convergence upon ConvNets is making the field less diverse, it is actually making the techniques easier to comprehend. It is easier to “borrow breakthrough thinking” from one research direction when the core computations are cast in the language of ConvNets. Using ConvNets, properly trained (and motivated!) 21 year old graduate student are actually able to compete on benchmarks, where previously it would take an entire 6-year PhD cycle to compete on a non-trivial benchmark.
See you next week in Chile!
Update (January 13th, 2016)
Achievement awards
- PAMI Distinguished Researcher Award (1): Yann LeCun
- PAMI Distinguished Researcher Award (2): David Lowe
- PAMI Everingham Prize Winner (1): Andrea Vedaldi for VLFeat
- PAMI Everingham Prize Winner (2): Daniel Scharstein and Rick Szeliski for the Middlebury Datasets
Paper awards
- PAMI Helmholtz Prize (1): David Martin, Charles Fowlkes, Doron Tal, and Jitendra Malik for their ICCV 2001 paper “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics”.
- PAMI Helmholtz Prize (2): Serge Belongie, Jitendra Malik, and Jan Puzicha, for their ICCV 2001 paper “Matching Shapes”.
- Marr Prize: Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, and Samual Rota Bulo, for “Deep Neural Decision Forests”.
- Marr Prize honorable mention: Saining Xie and Zhuowen Tu for“Holistically-Nested Edge Detection”.
2016年3月11日, 星期五, 10:45 订阅本文评论 订阅本站 引用本文
算法技术, 视觉算法
相关文章
- 实时SLAM的未来及与深度学习的比较 (0)
- 图像视觉博客资源2之MIT斯坦福CMU (2)
- OpenCV学习笔记大集锦 (0)
- 视觉领域博客资源1之中国部分 (0)
- 行人检测资源(下)代码数据 (0)
- 行人检测资源(上)综述文献 (1)
发表见解
昵称:(必填)
邮箱:(必填)
地址:(以便回访)
 
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
 
(转) ICCV 2015:21篇最火爆研究论文的更多相关文章
- 研究NLP100篇必读的论文---已整理可直接下载
		100篇必读的NLP论文 100 Must-Read NLP 自己汇总的论文集,已更新 链接:https://pan.baidu.com/s/16k2s2HYfrKHLBS5lxZIkuw 提取码:x ... 
- 如何在两个月的时间内发表一篇EI/SCI论文-我的时间管理心得
		在松松垮垮的三年研究生时期,要说有点像样的成果,也只有我的小论文可以谈谈了.可能有些厉害的角色研究生是丰富而多彩的,而大多数的同学在研究生阶段可能同我一样,是慢悠悠的渡过的,而且可能有的还不如我,我还 ... 
- ICCV 2015 B-CNN细粒度分类
		哈哈,好久没写博客了....最近懒癌发作~~主要是因为心情不太好啊,做什么事情都不太顺心,不过已经过去啦.最近一直忙着公司的项目,想用这个网络,就给大家带来了的这篇文章.可能比较老,来自ICCV 20 ... 
- NLP+语篇分析(五)︱中文语篇分析研究现状(CIPS2016)
		摘录自:CIPS2016 中文信息处理报告<第三章 语篇分析研究进展.现状及趋势>P21 CIPS2016 中文信息处理报告下载链接:http://cips-upload.bj.bcebo ... 
- itemKNN发展史----推荐系统的三篇重要的论文解读
		itemKNN发展史----推荐系统的三篇重要的论文解读 本文用到的符号标识 1.Item-based CF 基本过程: 计算相似度矩阵 Cosine相似度 皮尔逊相似系数 参数聚合进行推荐 根据用户 ... 
- Mysql高手系列 - 第21篇:什么是索引?
		Mysql系列的目标是:通过这个系列从入门到全面掌握一个高级开发所需要的全部技能. 这是Mysql系列第21篇. 本文开始连续3篇详解mysql索引: 第1篇来说说什么是索引? 第2篇详解Mysql中 ... 
- SLAM架构的两篇顶会论文解析
		SLAM架构的两篇顶会论文解析 一. 基于superpoint的词袋和图验证的鲁棒闭环检测 标题:Robust Loop Closure Detection Based on Bag of Super ... 
- ICCV 2019|70 篇论文抢先读,含目标检测/自动驾驶/GCN/等(提供PDF下载)
		虽然ICCV2019已经公布了接收ID名单,但是具体的论文都还没放出来,为了让大家更快得看论文,我们汇总了目前已经公布的大部分ICCV2019 论文,并组织了ICCV2019论文汇总开源项目(http ... 
- CVPR 2020 三篇有趣的论文解读
		作者 | 文永亮 学校 | 哈尔滨工业大学(深圳) 研究方向 | 视频预测.时空序列预测 目录 AdderNet - 其实不需要这么多乘法 Deep Snake for Real-Time Insta ... 
随机推荐
- 防止忘记初始化NSMutableArray的方法
			在写项目的过程中,经常会遇到一些郁闷的事,往一个可变数组中添加一个模型数据时,经常会发现程序运行很正常,可是可变数组中就是没有任何数据,久病成医,我发现自己总是放一个错,就是NSMutableArra ... 
- iOS对象序列化
			系统对象的归档我就不介绍了,这个不复杂,自己看一下就会了. 我在这里主要介绍自定义对象的归档. Sample.h文件 // // Sample.h // Serialization // // ... 
- HDOJ-三部曲一(搜索、数学)-1008-Prime Path
			Prime Path Time Limit : 2000/1000ms (Java/Other) Memory Limit : 131072/65536K (Java/Other) Total S ... 
- java作业2
			(一) 仔细阅读示例: EnumTest.java,运行它,分析运行结果? 你能得到什么结论?你掌握了枚举类型的基本用法了吗? 结论:枚举不属于原始数据类型,它的每个具体值都引用一个特定的对象.相同的 ... 
- JLOI 斯迈利的赌注
			直接高精度模拟,加上简单贪心 Program XJOI2263; ..] of longint; var a,b:arr; s1,s2:ansistring; i,j:longint; sum:int ... 
- mysql 远程连接失败(linux)
			主要有三个原因:1.mysql授权表里没有远程机器的权限,及需要在授权表mysql.user添加grant all privileges on *.* to 'root'@'远程登陆IP' ident ... 
- HDU 1052
			http://acm.hdu.edu.cn/showproblem.php?pid=1052 田忌赛马本质就是一个贪心 res表示田忌的胜利场次 1.田忌最快马快于王的最快马,两个最快马比,res++ ... 
- [转]设计模式(22)-Strategy Pattern
			一. 策略(Strategy)模式 策略模式的用意是针对一组算法,将每一个算法封装到具有共同接口的独立的类中,从而使得它们可以相互替换.策略模式使得算法可以在不影响到客户端的情况下发生变化. 假 设现 ... 
- 【题解】【BT】【Leetcode】Populating Next Right Pointers in Each Node
			Given a binary tree struct TreeLinkNode { TreeLinkNode *left; TreeLinkNode *right; TreeLinkNode *nex ... 
- mySQL CRUD操作(数据库的增删改查)
			一.数据库操作 1.创建数据库 create database 数据库名称 2.删除数据库 drop database 数据库名称 二.表操作 1.创建表 create table 表名 ( ... 
