卷积神经网络和CIFAR-10:Yann LeCun专访 Convolutional Nets and CIFAR-10: An Interview with Yann LeCun
Recently Kaggle hosted a competition on the CIFAR-10 dataset. The CIFAR-10 dataset consists of 60k 32x32 colour images in 10 classes. This dataset was collected by Alex
Krizhevsky, Vinod Nair, and Geoffrey Hinton.
Many contestants used convolutional nets to tackle this competition. Some resulted in scores that beat human performance on this classification task. In this blog series we will interview three contestants and also a founding father of convolutional nets: Yann LeCun.

EXAMPLE OF CIFAR-10 DATASET
Yann LeCun
Yann Lecun is currently Director of AI Research at Facebook and a professor at NYU.
Which other scientists should be named and celebrated for the successes of convolutional nets?
Certainly, Kunihiko Fukushima's work on the Neo-Cognitron was an inspiration. Although the early forms of convnets owed little to the Neo-Cognitron, the version we settled on (with pooling layers) did.

A SCHEMATIC DIAGRAM ILLUSTRATING THE INTERCONNECTIONS BETWEEN LAYERS IN THE NEO-COGNITRON. FROM FUKUSHIMA K. (1980) NEOCOGNITRON: A SELF-ORGANIZING NEURAL NETWORK MODEL FOR A MECHANISM OF PATTERN RECOGNITION UNAFFECTED BY SHIFT IN POSITION.
Can you recount an aha!-moment or theoretical breakthrough during your early research into convolutional nets?
Not really. It was the logical thing to do. I had been playing with multi-layer nets with local connections since 1982 or so (not having the right learning algorithm though. Backprop didn't exist). I started experimenting with shared weight nets while I was a postdoc in Toronto in 1988.
The reason for not trying earlier is simply that I didn't have the software nor the data. Once I arrived at Bell Labs, I had access to a large dataset and fast computers (for the time). So I could try full-size convnets, and it worked amazingly well (though it required 2 weeks of training).
What is your opinion on the recent popularity of convolutional nets for object recognition? Did you expect it?
Yes. I knew it had to happen. It was a matter of time until the datasets become large enough and the computers powerful enough for deep learning algorithms to become better than human engineers at designing vision systems.
There was a symposium entitled "frontiers in computer vision" at MIT in August 2011. The title of my talk was “5 years from now, everyone will learn their features (you might as well start now)”. David Lowe (the inventor ofSIFT) said the same thing.

A SLIDE FROM THE TALK LECUN Y. (2011) “5 YEARS FROM NOW, EVERYONE WILL LEARN THEIR FEATURES (YOU MIGHT AS WELL START NOW)”.
Still, I was surprised by how fast the revolution happened and how much better convnets are, compared to other approaches. I would have expected the transition to be more gradual. Also, I would have expected unsupervised learning to play a greater role.
The character recognition model at AT&T was more than a simple classifier, but a complete pipeline. Can you tell more about the implementation problems your team faced?
We had to implement our own program language and write our own compiler to build this. Leon Bottou and I had written a neural net simulator called SN, back in 1987/1988. It was a Lisp interpreter with a numerical library (multidimensional arrays, neural net graphs…). We used this at Bell Labs to develop the first convnets.
Then in the early 90’s, we wanted to use our code in products. Initially, we hired a team of developers to convert our Lisp code to C/C++. But the resulting system could not be improved easily (it wasn’t a good platform for R&D). So Leon, Patrice Simard and I wrote a compiler for SN, which we used to develop the next generation OCR engine.
That system integrated a segmenter, a convnet, and a graphical model on top. The whole thing was trained end to end.
The graphical model was called a “graph transformer network”. It was conceptually similar to what we now call a conditional random field, or a structured perceptron (which it predates), but it allowed for non-linear scoring function (CRF and structured perceptrons can only have linear scoring functions).
The whole infrastructure was written in SN and compiled. This is the system that was deployed in ATM machines and check reading machines in 1996 and was reading 10 to 20% of all the checks in the US by the late 90’s.

AN ANIMATION SHOWING LENET 5 IN ACTION. FROM "INVARIANCE AND MULTIPLE CHARACTERS WITH SDNN (MULTIPLE CHARACTERS DEMO)".
In comparison with other methods, training convnets is pretty slow. How do you deal with the trade-off between experimentation and increased model training times? What does a typical development iteration look like?
In my experience, the best large-scale learning systems always take 2 or 3 weeks to train, regardless of the task, the method, the hardware, or the data.
I don’t know if convnets are “pretty slow”. Compared to what? They may be slow to train, but the alternative to “slow learning” is months of engineering efforts which doesn’t work as well in the end. Also, convnets are actually pretty fast to run (after training).
In a real application, no one really cares how long it takes to train. But people care a lot about how long it takes to run.
Which recent papers on convolutional nets are you most excited about? Any papers or ideas we should look out for?
There are lots and lots of ideas surrounding convnets and deep learning that have lived in relative obscurity for the last 20 years or so. No ones cared about it, and getting papers published was always a struggle. So, lots of ideas were never properly tried, never published, or were tried and published but soundly ignored and quickly forgotten. Who remembers that the first learning-based face detector that actually worked was a convolutional net (back in 1993, eight years before Viola-Jones)?

A FIGURE WITH PREDICTIONS FROM VAILLANT R., MONROCQ C., LECUN Y. (1993) "AN ORIGINAL APPROACH FOR THE LOCALISATION OF OBJECTS IN IMAGES".
Today, it’s really amazing to see so many young and bright people devoting so much creative energy to the topic and coming up with new ideas and new applications. The hardware / software infrastructure is getting better, and it’s becoming possible to train large networks in a few hours or a few days. So people can explore many more ideas that in the past.
One thing I’m excited about is the idea of “spectral convolutional net”. This was a paper at ICLR 2014 by folks from my NYU lab about a generalization of convolutional nets that can be applied to any graphs (regular convnets can be applied to 1D, 2D or 3D arrays that can be seen as regular grids in terms of graph). There are practical issues, but it opens the door to many more applications of convnets to unstructured data.

MNIST DIGITS ON A SPHERE. FROM BRUNA J., ZAREMBA W., SZLAM A., LECUN Y. (2013) "SPECTRAL NETWORKS AND DEEP LOCALLY CONNECTED NETWORKS ON GRAPHS".
I’m very excited about the application of convnets (and recurrent nets) to natural language understanding (following the seminal work of Collobert and Weston).
Since the error rate of a human is estimated to be around 6%, and Dr. Graham showed results of 4.47%, do you consider CIFAR-10 to be a solved problem?
It’s a solved problem in the same sense as MNIST is a solved problem. But frankly, people are more interested inImageNet than in CIFAR-10 nowadays. In that sense, CIFAR-10 is not a “real” problem. But it’s not a bad benchmark for a new algorithm.
What would it take for convnets to see a much wider adoption in the industry? Will training convnets and the software to set them up become less challenging?
What are you talking about? Convnets are absolutely everywhere now (or about to be everywhere) in industry:Facebook, Google, Microsoft, IBM, Baidu, NEC, Twitter, Yahoo!….
That said, it’s true that all of these companies have significant R&D resources and that training convnets can still be challenging for smaller companies or companies that are less technically advanced.
It still requires quite of bit of experience and time investment to train a convnet if you don’t have prior training. Soon however, there will be several simple to use open source packages with efficient back-ends for that.
Are we close to the limit for convnets? Or could CIFAR-100 be "solved" next?
I don’t think it’s a good test. ImageNet is a much better test.
Shallow nets can be trained that perform similarly to complex, well-engineered, deeper convolutional architectures. Do deep nets really need to be deep?
Yes, deep nets need to be deep. Try to train a shallow net to emulate a deep convnet trained on ImageNet. Come back when you have done that. In theory, a deep net can be approximated by a shallow one. But on complex tasks, the shallow net will have to be be ridiculously large.
Most of your academic work is highly practical in nature. Is this something you purposefully aim for, or is this an artefact of being employed by companies? Can you tell about the distinction between theory and practice?
Hey, I’ve been in academia since 2003, and I’m still a part-time professor at NYU. I do theory when it helps me understand things. Theory often help us understand what’s possible and what’s not possible. It helps suggest proper ways to do things.
But sometimes theory restricts our thinking. Some people will not work with some models because the theory about them is too difficult. But often, a technique works well before the reasons for it working well are fully understood theoretically.
By restricting yourself to work on stuff you fully understand theoretically, you are condemned to using conceptually simple methods.
Also, sometimes theory blinds us. For example, some people were dazzled by kernel methods because of the cute math that goes with it. But, as I’ve said in the past, in the end, kernel machines are shallow networks that perform “glorified template matching”. There is nothing wrong with that (SVM is a great method), but it has dire limitations that we should all be aware of.

A SLIDE FROM LECUN Y. (2013) LEARNING HIERARCHIES FROM INVARIANT FEATURES
What is your opinion on a well-performing convnet without any theoretical justifications for why it should work so well? Do you generally favor performance over theory? Where do you place the balance?
I don’t think their is a choice to make between performance and theory. If there is performance, there will be theory to explain it.
Also, what kind of theory are we talking about? Is it a generalization bound? Convnets have a finite VC dimension, hence they are consistent and admit the classical VC bounds. What more do you want? Do you want a tighter bound, like what you get for SVMs? No theoretical bound that I know of is tight enough to be useful in practice. So I really don’t understand the point. Sure, generic VC bounds are atrociously non tight, but non-generic bounds (like for SVMs) are only slightly less atrociously non tight.
If what you desire are convergence proofs (or guarantees), that’s a little more complicated. The loss function of multi-layer nets is non convex, so the easy proofs that assume convexity are out the window. But we all know that in practice, a convnet will almost always converge to the same level of performance, regardless of the starting point (if the initialization is done properly). There is theoretical evidence that there are lots and lots of equivalent local minima and a very small number of “bad” local minima. Hence convergence is rarely a problem.
What is your opinion on AI hype. Which practices do you think are detrimental to the field (of AI in general and specifically convnets)?
AI hype is extremely dangerous. It killed various approaches to AI at least 4 times in the past. I keep calling out hype whenever I see it, whether it’s from the press, from startups looking for investors, from large companies looking for PR, or from academics looking for grants.
There is certainly quite a bit of hype around deep learning at the moment. I don’t see a particularly high level of hype around convnets specifically. There is more hype around “cortical this”, “spiking that”, and “neuromorphic blah”. Unlike many of these things, convnets actually yield good results on useful tasks and are widely deployed in industrial applications.
Any interesting projects at Facebook involving convnets that you could talk a little more about? Some basic stats about the size?
DeepFace: a convnet for face recognition. There are also convnets for image tagging. They are big.

A FIGURE DESCRIBING THE ARCHITECTURE FROM THE PRESENTATION "TAIGMAN Y., YANG M., RANZATO M., WOLF L. (2014) DEEPFACE FOR UNCONSTRAINED FACE RECOGNITION".
Recently you posted about 4 types of serious researchers. How would you label yourself?
I’m a 3, with a bit of 1 and 4.
- "People who want to explain/understand learning (and perhaps intelligence) at the fundamental/theoretical level.
- People who want to solve practical problems and have no interest in neuroscience.
- People who want to understand intelligence, build intelligent machines, and have a side interest in understanding how the brain works.
- People whose primary interest is to understand how the brain works, but feel they need to build computer models that actually work in order to do so."
Anything you wish to say to the top contestants in the CIFAR-10 challenge? Anything you wish to say to (hobbyist) researchers studying convnets? Anything in general you wish to say about the CIFAR dataset/problem?
I’m impressed by how much creativity and engineering knack went into this. It’s nice that people have pushed the technique as far as it will go on this dataset.
But it’s going to get easier and easier for independent researchers and hobbyist to play with these things and apply them to larger datasets. I think the successor to CIFAR-10 should be ImageNet-1K-128x128. This would be a version of the 1000 category ImageNet classification task where the images have been normalized to 128x128. I see several advantages:
- the networks are small enough to be trainable in a reasonable amount of time on a high-end gamer rig;
- the network you get at the end can actually be used for useful application (like robot vision);
- the network can be run in real time on embedded platforms, like smart phones or the NVIDIA Jetson TK1.

PREDICTIONS ON IMAGENET. FROM "KRIZHEVSKY A., SUTSKEVER I., HINTON. G.E. (2012) IMAGENET CLASSIFICATION WITH DEEP CONVOLUTIONAL NEURAL NETWORKS".
The need to have large amounts of labeled data can be a problem. What is your opinion on nets trained on unlabeled data, or the automatic labeling of data through image search engines?
There are tasks like video understanding and natural language understanding where we are going to have to use unsupervised learning. But these modalities have a temporal dimension that changes how we can approach the problem.
Clearly, we need to devise algorithms that can learn the structure of the perceptual world without being told the name of everything. Many of us have been working on this for years (if not decades), but none of us has a perfect solution.
What is your latest research focusing on?
There are two answers to this question:
- Projects I’m personally involved in (enough that I would be co-author on the papers);
- projects that I set the stage for, encourage other work on, and advise at the conceptual level, but in which I am not involved enough to be co-author on a paper.
A lot of (1) is at NYU and a lot of (2) is at Facebook.
The general areas are:
unsupervised learning that discovers “invariant” features, the marriage of deep learning and structured prediction, the unification of supervised and unsupervised learning, solving the problem of learning long-term dependencies, building learning systems with short-term/scratchpad memory, learning plans and sequences of actions, different ways to optimize functions than to follow the gradient, the integration of representation learning with reasoning (read Leon Bottou’s excellent position paper “from machine learning to machine reasoning”), the use of learning to perform inference efficiently, and many other topics.
from: http://blog.kaggle.com/2014/12/22/convolutional-nets-and-cifar-10-an-interview-with-yan-lecun/
卷积神经网络和CIFAR-10:Yann LeCun专访 Convolutional Nets and CIFAR-10: An Interview with Yann LeCun的更多相关文章
- 卷积神经网络图像纹理合成 Texture Synthesis Using Convolutional Neural Networks
代码实现 概述 这是关于Texture Synthesis Using Convolutional Neural Networks论文的tensorflow2.0代码实现,使用keras预训练的VGG ...
- 人脸检测及识别python实现系列(4)——卷积神经网络(CNN)入门
人脸检测及识别python实现系列(4)——卷积神经网络(CNN)入门 上篇博文我们准备好了2000张训练数据,接下来的几节我们将详细讲述如何利用这些数据训练我们的识别模型.前面说过,原博文给出的训练 ...
- 图卷积神经网络(GCN)入门
图卷积网络Graph Convolutional Nueral Network,简称GCN,最近两年大热,取得不少进展.不得不专门为GCN开一个新篇章,表示其重要程度.本文结合大量参考文献,从理论到实 ...
- 【翻译】TensorFlow卷积神经网络识别CIFAR 10Convolutional Neural Network (CNN)| CIFAR 10 TensorFlow
原网址:https://data-flair.training/blogs/cnn-tensorflow-cifar-10/ by DataFlair Team · Published May 21, ...
- 【原创】梵高油画用深度卷积神经网络迭代10万次是什么效果? A neural style of convolutional neural networks
作为一个脱离了低级趣味的码农,春节假期闲来无事,决定做一些有意思的事情打发时间,碰巧看到这篇论文: A neural style of convolutional neural networks,译作 ...
- 深度学习FPGA实现基础知识10(Deep Learning(深度学习)卷积神经网络(Convolutional Neural Network,CNN))
需求说明:深度学习FPGA实现知识储备 来自:http://blog.csdn.net/stdcoutzyx/article/details/41596663 说明:图文并茂,言简意赅. 自今年七月份 ...
- [DeeplearningAI笔记]卷积神经网络3.10候选区域region proposals与R-CNN
4.3目标检测 觉得有用的话,欢迎一起讨论相互学习~Follow Me 3.10 region proposals候选区域与R-CNN 基于滑动窗口的目标检测算法将原始图片分割成小的样本图片,并传入分 ...
- [DeeplearningAI笔记]卷积神经网络4.6-4.10神经网络风格迁移
4.4特殊应用:人脸识别和神经网络风格转换 觉得有用的话,欢迎一起讨论相互学习~Follow Me 4.6什么是神经网络风格转换neural style transfer 将原图片作为内容图片Cont ...
- Python机器学习笔记:卷积神经网络最终笔记
这已经是我的第四篇博客学习卷积神经网络了.之前的文章分别是: 1,Keras深度学习之卷积神经网络(CNN),这是开始学习Keras,了解到CNN,其实不懂的还是有点多,当然第一次笔记主要是给自己心中 ...
随机推荐
- 搭建SpringMVC+MyBatis开发框架一
大部分 Java 应用都是 Web 应用,展现层是 Web 应用不可忽略的重要环节.Spring 为展现层提供了一个优秀的 Web 框架—— Spring MVC.和众多其他 Web 框架一样,它基于 ...
- liferay MVCActionCommand的用法及例子
在liferay7中把portlet中的控制层拆成了3个部分: 1.MVCActionCommand 2.MVCRenderCommand 3.MVCRecourceCommand 至于为什么要拆出来 ...
- 一些 Shell 脚本(持续更新)
1. 启动日志分析 启动日志格式如下: 开机时间:2015/05/13 周三 16:45:17.79 关机时间:2015/05/13 周三 18:46:03.91 开机时间:2015/05/14 周四 ...
- android开发 缩放到指定比例的尺寸
一种通过matrix矩阵缩放: //使用Bitmap加Matrix来缩放 public static Drawable resizeImage(Bitmap bitmap, int w, int h) ...
- 利用 js 实现弹出蒙板(model)功能
关于 js 实现一个简单的蒙板功能(model) 思路: 创建一个蒙板, 设置蒙板的堆叠顺序保证能将其它元素盖住 position: absolute; top: 0; left: 0; displa ...
- 【CentOS】samba服务器安装与配置
参考资料: http://www.cnblogs.com/mchina/archive/2012/12/18/2816717.html 1.简介 2.安装 3.配置 1.简介 Samba是一个能让Li ...
- Cross Validation done wrong
Cross Validation done wrong Cross validation is an essential tool in statistical learning 1 to estim ...
- 对话框Dialog
QMainWindow QMainWindow是 Qt 框架带来的一个预定义好的主窗口类. 主窗口,就是一个普通意义上的应用程序(不是指游戏之类的那种)最顶层的窗口.通常是由一个标题栏,一个菜单栏,若 ...
- self._raiseerror(v) File "D:\GameDevelopment\Python27\lib\xml\etree\ElementTree.py", line 1506, in _raiseerror
D:\BaiDuYun\Plist>python unpack_plist.py lobbyRelieveTraceback (most recent call last): File &quo ...
- tomcat 多开设置 需要需改的3个端口
启动多tomcat需要需改的3个端口 我所用Tomcat服务器都为zip版,非安装版.以两个为例: 安装第二个Tomcat完成后,到安装目录下的conf子目录中打开server.xml文件,查找以下三 ...