As I walked through the large poster-filled hall at CVPR 2013, I asked myself, “Quo vadis Computer Vision?" (Where are you going, computer vision?)  I see lots of papers which exploit last year’s ideas, copious amounts of incremental research, and an overabundance of off-the-shelf computational techniques being recombined in seemingly novel ways.  When you are active in computer vision research for several years, it is not rare to find oneself becoming bored by a significant fraction of papers at research conferences.  Right after the main CVPR conference, I felt mentally drained and needed to get a breath of fresh air, so I spent several days checking out the sights in Oregon.  Here is one picture -- proof that the CVPR2013 had more to offer than ideas!

When I returned from sight-seeing, I took a more circumspect look at the field of computer vision.  I immediately noticed that vision research is actually advancing and growing in a healthy way.  (Unfortunately, most junior students have a hard determining which research papers are actually novel and/or significant.)  A handful of new research themes arise each year, and today I’d like to briefly discuss three new computer vision research themes which are likely to rise in popularity in the foreseeable future (2-5 years).
 
1) RGB-D input data is trending.  
 
Many of this year’s papers take a single 2.5D RGB-D image as input and try to parse the image into its constituent objects.  The number of papers doing this with RGBD data is seemingly infinite.  Some other CVPR 2013 approaches don’t try to parse the image, but instead do something else like: fit cuboids, reason about affordances in 3D, or reason about illumination.  The reason why such inputs are becoming more popular is simple: RGB-D images can be obtained via cheap and readily available sensors such as Microsoft’s Kinect.  Depth measurements used to be obtained by expensive time of flight sensors (in the late 90s and early 00s), but as of 2013, $150 can buy you one these depth sensing bad-boys!  In fact, I had bought a Kinect just because I thought that it might come in handy one day -- and since I’ve joined MIT, I’ve been delving into the RGB-D reconstruction domain on my own.  It is just a matter of time until the newest iPhone has an on-board depth sensor, so the current line of research which relies on RGB-D input is likely to become the norm within a few years.
 
 
 
 
 
 
 

 


2) Mid-level patch discovery is a hot research topic.
 Saurabh Singh from CMU introduced this idea in his seminal ECCV 2012 paper, and Carl Doersch applied this idea to large-scale Google Street-View imagery in the “What makes Paris look like Paris?” SIGGRAPH 2012 paper.  The idea is to automatically extract mid-level patches (which could be objects, object parts, or just chunks of stuff) from images with the constraint that those are the most informative patches.

Unsupervised Discovery of Mid-Level Discriminative Patches Saurabh Singh, Abhinav Gupta, Alexei A. Efros. In ECCV, 2012.

Carl DoerschSaurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros. What Makes Paris Look like Paris? In SIGGRAPH 2012. [pdf]

At CVPR 2013, it was evident that the idea of "learning mid-level parts for scenes" is being pursued by other top-tier computer vision research groups.  Here are some CVPR 2013 papers which capitalize on this idea:

Blocks that Shout: Distinctive Parts for Scene Classification. Mayank Juneja, Andrea Vedaldi, CV Jawahar, Andrew Zisserman. In CVPR, 2013. [pdf]

Representing Videos using Mid-level Discriminative Patches. Arpit Jain, Abhinav Gupta, Mikel Rodriguez, Larry Davis. CVPR, 2013. [pdf]

Part Discovery from Partial Correspondence. Subhransu Maji, Gregory Shakhnarovich. In CVPR, 2013. [pdf]

3) Deep-learning and feature learning are on the rise within the Computer Vision community.
It seems that everybody at Google Research is working on Deep-learning.  Will it solve all vision problems?  Is it the one computational ring to rule them all?  Personally, I doubt it, but the rising presence of deep learning is forcing every researcher to brush up on their l33t backprop skillz.  In other words, if you don't know who Geoff Hinton is, then you are in trouble.

 
挖坑中。。。过两天把这篇日志翻译完

[转载]Three Trending Computer Vision Research Areas, 从CVPR看接下来几年的CV的发展趋势的更多相关文章

  1. Computer Vision: OpenCV, Feature Tracking, and Beyond--From <<Make Things See>> by Greg

    In the 1960s, the legendary Stanford artificial intelligence pioneer, John McCarthy, famously gave a ...

  2. Computer Vision Tutorials from Conferences (3) -- CVPR

    CVPR 2013 (http://www.pamitc.org/cvpr13/tutorials.php) Foundations of Spatial SpectroscopyJames Cogg ...

  3. paper 156:专家主页汇总-计算机视觉-computer vision

    持续更新ing~ all *.files come from the author:http://www.cnblogs.com/findumars/p/5009003.html 1 牛人Homepa ...

  4. inception_v2版本《Rethinking the Inception Architecture for Computer Vision》(转载)

    转载链接:https://www.jianshu.com/p/4e5b3e652639 Szegedy在2015年发表了论文Rethinking the Inception Architecture ...

  5. (转) WTF is computer vision?

        WTF is computer vision? Posted Nov 13, 2016 by Devin Coldewey, Contributor   Next Story   Someon ...

  6. Computer Graphics Research Software

    Computer Graphics Research Software Helping you avoid re-inventing the wheel since 2009! Last update ...

  7. Analyzing The Papers Behind Facebook's Computer Vision Approach

    Analyzing The Papers Behind Facebook's Computer Vision Approach Introduction You know that company c ...

  8. Computer Vision Algorithm Implementations

    Participate in Reproducible Research General Image Processing OpenCV (C/C++ code, BSD lic) Image man ...

  9. Computer Vision Resources

    Computer Vision Resources Softwares Topic Resources References Feature Extraction SIFT [1] [Demo pro ...

随机推荐

  1. IOS8解决获取位置坐标信息出错(Error Domain=kCLErrorDomain Code=0)(转)

    标题:IOS8解决获取位置坐标信息出错(Error Domain=kCLErrorDomain Code=0) 前几天解决了在ios8上无法使用地址位置服务的问题,最近在模拟器上调试发现获取位置坐标信 ...

  2. Win8下Visual Studio编译报“无法注册程序集***dll- 拒绝访问。请确保您正在以管理员身份运行应用程序。对注册表项”***“的访问被拒绝。”问题修正(转)

    原来在Win7下Visual Studio跑的好好的程序,现在在Win8下编译报“无法注册程序集***dll- 拒绝访问.请确保您正在以管理员身份运行应用程序.对注册表项”***“的访问被拒绝.”的错 ...

  3. 历尽磨难之PL/SQL链接Oracle数据库

    说起来都是泪啊,上司布置的任务需要远程连接Oracle数据库,说实话这又是我人生中的第一次.我听到以后觉得不是什么大问题,然而我错了..错的很厉害! 我搞了一天一夜才弄好,这里面原因有很多,大体来讲还 ...

  4. Linux Shell 文本处理工具集锦 zz

    内容目录: find 文件查找 grep 文本搜索 xargs 命令行参数转换 sort 排序 uniq 消除重复行 用tr进行转换 cut 按列切分文本 paste 按列拼接文本 wc 统计行和字符 ...

  5. gbd基本使用一

    http://biancheng.dnbcw.info/linux/391846.html

  6. Sprint(第十二天11.25)

  7. jquery总结06-动画事件03-淡入淡出效果

    .fadeout()淡出 .fadein()淡入 .fadeTaggle()淡入淡出切换 .fadeTo()淡入设定透明度 淡入淡出fadeIn与fadeOut都是修改元素样式的opacity属性,但 ...

  8. SQLServer语句执行效率及性能测试

    写程序的人,往往需要分析所写的SQL语句是否已经优化过了,服务器的响应时间有多快,这个时候就需要用到SQL的STATISTICS状态值来查看了. 通过设置STATISTICS我们可以查看执行SQL时的 ...

  9. Manual——Test (翻译1)

    LTE Manual ——Logging(翻译) (本文为个人学习笔记,如有不当的地方,欢迎指正!) 1.17.3 Testing framework(测试框架)   ns-3 包含一个仿真核心引擎. ...

  10. java appium api

    Appium中部分api的使用方法,有需要的朋友可以参考下. 使用的语言是java,appium的版本是1.3.4,java-client的版本是java-client-2.1.0,建议多参考java ...