[转载]Three Trending Computer Vision Research Areas，从CVPR看接下来几年的CV的发展趋势

As I walked through the large poster-filled hall at CVPR 2013, I asked myself, “Quo vadis Computer Vision?" (Where are you going, computer vision?) I see lots of papers which exploit last year’s ideas, copious amounts of incremental research, and an overabundance of off-the-shelf computational techniques being recombined in seemingly novel ways. When you are active in computer vision research for several years, it is not rare to find oneself becoming bored by a significant fraction of papers at research conferences. Right after the main CVPR conference, I felt mentally drained and needed to get a breath of fresh air, so I spent several days checking out the sights in Oregon. Here is one picture -- proof that the CVPR2013 had more to offer than ideas!

When I returned from sight-seeing, I took a more circumspect look at the field of computer vision. I immediately noticed that vision research is actually advancing and growing in a healthy way. (Unfortunately, most junior students have a hard determining which research papers are actually novel and/or significant.) A handful of new research themes arise each year, and today I’d like to briefly discuss three new computer vision research themes which are likely to rise in popularity in the foreseeable future (2-5 years).

1) RGB-D input data is trending.

Many of this year’s papers take a single 2.5D RGB-D image as input and try to parse the image into its constituent objects. The number of papers doing this with RGBD data is seemingly infinite. Some other CVPR 2013 approaches don’t try to parse the image, but instead do something else like: fit cuboids, reason about affordances in 3D, or reason about illumination. The reason why such inputs are becoming more popular is simple: RGB-D images can be obtained via cheap and readily available sensors such as Microsoft’s Kinect. Depth measurements used to be obtained by expensive time of flight sensors (in the late 90s and early 00s), but as of 2013, $150 can buy you one these depth sensing bad-boys! In fact, I had bought a Kinect just because I thought that it might come in handy one day -- and since I’ve joined MIT, I’ve been delving into the RGB-D reconstruction domain on my own. It is just a matter of time until the newest iPhone has an on-board depth sensor, so the current line of research which relies on RGB-D input is likely to become the norm within a few years.

H. Jiang and J. Xiao. A Linear Approach to Matching Cuboids in RGBD Images. In CVPR 2013. [pdf] [code]

2) Mid-level patch discovery is a hot research topic. Saurabh Singh from CMU introduced this idea in his seminal ECCV 2012 paper, and Carl Doersch applied this idea to large-scale Google Street-View imagery in the “What makes Paris look like Paris?” SIGGRAPH 2012 paper. The idea is to automatically extract mid-level patches (which could be objects, object parts, or just chunks of stuff) from images with the constraint that those are the most informative patches.

Unsupervised Discovery of Mid-Level Discriminative Patches Saurabh Singh, Abhinav Gupta, Alexei A. Efros. In ECCV, 2012.

Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros. What Makes Paris Look like Paris? In SIGGRAPH 2012. [pdf]

At CVPR 2013, it was evident that the idea of "learning mid-level parts for scenes" is being pursued by other top-tier computer vision research groups. Here are some CVPR 2013 papers which capitalize on this idea:

Blocks that Shout: Distinctive Parts for Scene Classification. Mayank Juneja, Andrea Vedaldi, CV Jawahar, Andrew Zisserman. In CVPR, 2013. [pdf]

Representing Videos using Mid-level Discriminative Patches. Arpit Jain, Abhinav Gupta, Mikel Rodriguez, Larry Davis. CVPR, 2013. [pdf]

Part Discovery from Partial Correspondence. Subhransu Maji, Gregory Shakhnarovich. In CVPR, 2013. [pdf]

3) Deep-learning and feature learning are on the rise within the Computer Vision community.
It seems that everybody at Google Research is working on Deep-learning. Will it solve all vision problems? Is it the one computational ring to rule them all? Personally, I doubt it, but the rising presence of deep learning is forcing every researcher to brush up on their l33t backprop skillz. In other words, if you don't know who Geoff Hinton is, then you are in trouble.

挖坑中。。。过两天把这篇日志翻译完

[转载]Three Trending Computer Vision Research Areas，从CVPR看接下来几年的CV的发展趋势的更多相关文章

Computer Vision: OpenCV, Feature Tracking, and Beyond--From <<Make Things See>> by Greg
In the 1960s, the legendary Stanford artificial intelligence pioneer, John McCarthy, famously gave a ...
Computer Vision Tutorials from Conferences (3) -- CVPR
CVPR 2013 (http://www.pamitc.org/cvpr13/tutorials.php) Foundations of Spatial SpectroscopyJames Cogg ...
paper 156：专家主页汇总-计算机视觉-computer vision
持续更新ing~ all *.files come from the author:http://www.cnblogs.com/findumars/p/5009003.html 1 牛人Homepa ...
inception_v2版本《Rethinking the Inception Architecture for Computer Vision》(转载)
转载链接:https://www.jianshu.com/p/4e5b3e652639 Szegedy在2015年发表了论文Rethinking the Inception Architecture ...
（转） WTF is computer vision?
WTF is computer vision? Posted Nov 13, 2016 by Devin Coldewey, Contributor Next Story Someon ...
Computer Graphics Research Software
Computer Graphics Research Software Helping you avoid re-inventing the wheel since 2009! Last update ...
Analyzing The Papers Behind Facebook's Computer Vision Approach
Analyzing The Papers Behind Facebook's Computer Vision Approach Introduction You know that company c ...
Computer Vision Algorithm Implementations
Participate in Reproducible Research General Image Processing OpenCV (C/C++ code, BSD lic) Image man ...
Computer Vision Resources
Computer Vision Resources Softwares Topic Resources References Feature Extraction SIFT [1] [Demo pro ...

随机推荐

[转]Meta http-equiv属性详解
http-equiv顾名思义,相当于http的文件头作用,它可以向浏览器传回一些有用的信息,以帮助正确和精确地显示网页内容,与之对应的属性值为content,content中的内容其实就是各个参数的变 ...
sqllite 入门
链接: http://www.jb51.net/article/52064.htm
关于HTML5音频——audio标签和Web Audio API各平台浏览器的支持情况
对比audio标签和 Web Audio API 各平台浏览器的支持情况: audio element Web Audio API desktop browsers Chrome 14 Yes ...
建立eureka服务和客户端(客户端获取已经注册服务)
1. 新建sping boot eureka server 新建立spring starter project 修改pom.xml文件在parent后追加 <dependencyManage ...
freemarker数字格式化
1.在模板中直接加.toString()转化数字为字符串,如:${languageList.id.toString()}: 2.在freemarker配置文件freemarker.properties ...
Python_Day11_同步IO和异步IO
同步IO和异步IO,阻塞IO和非阻塞IO分别是什么,到底有什么区别?不同的人在不同的上下文下给出的答案是不同的.所以先限定一下本文的上下文. 本文讨论的背景是Linux环境下的network IO. ...
Linux Apache配置多个站点同时运行
这样一种场景:我们有一台服务器:但是想挂多个网站:那么Apache下配置虚拟主机可以满足这个需求: 比较简单的是基于主机名的配置步骤如下: 示例环境 ip:115.28.17.191 域名:baiju ...
linux centos yum安装LAMP环境
centos 6.5 1.yum安装和源代码编译在使用的时候没啥区别,但是安装的过程就大相径庭了,yum只需要3个命令就可以完成,源代码需要13个包,还得加压编译,步骤很麻烦,而且当做有时候会出错,源 ...
Timing Attack 周边感应sql
直接上硬菜.盲注的一段语句: 1170 UNION SELECT IF(SUBSTRING(current,1,1)) = CHAR(119) , BENCHMARK(5000000,ENCODE(' ...
python爬取网易评论
学习python不久,最近爬的网页都是直接源代码中直接就有的,看到网易新闻的评论时,发现评论时以json格式加载的..... 爬的网页是习大大2015访英的评论页http://comment.news ...

[转载]Three Trending Computer Vision Research Areas， 从CVPR看接下来几年的CV的发展趋势

[转载]Three Trending Computer Vision Research Areas， 从CVPR看接下来几年的CV的发展趋势的更多相关文章

随机推荐

热门专题

[转载]Three Trending Computer Vision Research Areas，从CVPR看接下来几年的CV的发展趋势

[转载]Three Trending Computer Vision Research Areas，从CVPR看接下来几年的CV的发展趋势的更多相关文章