The major advancements in Deep Learning in 2016

Pablo Tue, Dec 6, 2016 in MACHINE LEARNING

Deep Learning has been the core topic in the Machine Learning community the last couple of years and 2016 was not the exception. In this article, we will go through the advancements we think have contributed the most (or have the potential) to move the field forward and how organizations and the community are making sure that these powerful technologies are going to be used in a way that is beneficial for all.

One of the main challenges researchers have historically struggled with has beenunsupervised learning. We think 2016 has been a great year for this area, mainly because of the vast amount of work on Generative Models.

Moreover, the ability to naturally communicate with machines has been also one of the dream goals and several approaches have been presented by giants like Google and Facebook. In this context, 2016 was all about innovation in Natural Language Processing (NLP) problems which are crucial to reach this goal.

Unsupervised learning

Unsupervised learning refers to the task of extracting patterns and structure from raw data without extra information, as opposed to supervised learning where labels are needed.

The classical approach for this problem using neural networks has beenautoencoders. The basic version consists of a Multilayer Perceptron (MLP) where the input and output layer have the same size and a smaller hidden layer is trained to recover the input. Once trained, the output from the hidden layer corresponds to data representation that can be useful for clustering, dimensionality reduction, improving supervised classification and even for data compression.

Generative Adversarial Networks (GANs)

Recently, a new approach based on generative models has emerged. CalledGenerative Adversarial Networks, it has enabled models to tackle unsupervised learning. GANs are a real revolution. Such has been the impact of this research that in this presentation, Yann LeCun (one of the fathers of Deep Learning) said that GANs are the most important idea in Machine Learning in the last 20 years.

Although introduced in 2014 by Ian Goodfellow, it is in 2016 that GANs have started to show their real potential. Improved techniques for helping training and better architectures (Deep Convolutional GAN) introduced this year have fixed some of the previous limitations, and new applications (we list some of them later) are revealing how powerful and flexible they can be.

The intuitive idea

Imagine an aspiring painter who wants to do art forgery (G), and someone who wants to earn his living by judging paintings (D). You start by showing D some examples of work by Picasso. Then G produces paintings in an attempt to fool Devery time, making him believe they are Picasso originals. Sometimes it succeeds; however as D starts learning more about Picasso style (looking at more examples),G has a harder time fooling D, so he has to do better. As this process continues, not only D gets really good in telling apart what is Picasso and what is not, but also Ggets really good at forging Picasso paintings. This is the idea behind GANs.

Technically GANs consist of a constant push between two networks (thus “adversarial”): a generator (G) and discriminator (D). Given a set of training examples (such as images), we can imagine that there is an underlying distribution(x) that governs them. With GANs, G will generate outputs and D will decide if they come from the same distribution of the the training set or not.

G will start from some noise z, so the generated images are G(z). D takes images from the distribution (real) and fake (from G) and classifies them: D(x) and D(G(z)).

How a GAN works.

D and G are both learning at the same time, and once G is trained it knows enough about the distribution of the training samples that it can generate new samples that share very similar properties:

Images generated by a GAN.

These images were generated by a GAN trained with CIFAR-10. If you pay attention to the details, you can see they are not indeed real objects. However, there is something to them that captures a certain concept that can make them look real from a distance.

InfoGAN

Recent developments have extended the GANs idea to not only to approximate the data distribution, but also to learn interpretable, useful vector representations of the data. These desired vector representations need to capture rich information (same as in autoencoders) and also need to be interpretable, meaning that we can distinguish parts of the vector that contribute to a specific type of shape transformation in the generated outputs.

The InfoGAN model proposed by OpenAI researchers in August addresses this issue. In a nutshell, InfoGAN is able to generate representations that contain information about the dataset in an unsupervised way. For instance, when applied to the MNIST dataset it is able to infer the type of number (1, 2, 3, …), the rotation and the width of the generated samples without the need for manually tagged data.

Conditional GANs

Another extension of GANs is a class of models called Conditional GAN (cGAN). These models are able to generate samples taking into account external information (class label, text, another image), using it to force G to generate a particular type of output. Some applications that have recently surfaced are:

Text to image

Takes a textual description (encoded as a vector by a character level CNN or LSTM) as external information and it generates image based on it. SeeGenerative Adversarial Text to Image Synthesis (Jun 2016).
Image-to-Image

Maps input image to output image. See Image-to-Image Translation with Conditional Adversarial Nets (Nov 2016).
Super resolution

It takes downsampled images (less details) and the generator tries to approximate them to more natural detailed version. Anyone who everwatched CSI knows what we are talking about :)

See Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network (Nov 2016).

You can check more about generative models in this blog post or in this talk by Ian Goodfellow.

Natural Language Processing

In order to be able to have fluent conversations with machines, several issues need to be solved first: text understanding, question answering and machine translation.

Text understanding

Salesforce MetaMind has built a new model called Joint Many-Tasks (JMT) with the objective of creating a single model able to learn five common NLP tasks:

Part-of-speech tagging: Assign parts of speech to each word, such as noun, verb, adjective.
Chunking: Also called shallow parsing. Involves a range of tasks, like finding noun or verb groups.
Dependency parsing: Identify syntactic relationships (such as an adjective modifying a noun) between words.
Semantic relatedness: Measure the semantic distance between two sentences. The result is a real-valued score.
Textual entailment: Determine whether a premise sentences entails a hypothesis sentence. Possible classes: entailment, contradiction, and neutral.

The magic behind this model is that it is end-to-end trainable. This means it allowscollaboration between different layers, resulting in improvements on lower layers tasks (which are less complex), with the results from higher layers (more complex tasks). This is something new compared to older ideas, which could only use lower layers to improve higher level ones, but not the other way around. As a result, this model achieves state of the art results in all but POS tagging (where it came out in second place).

Question Answering

MetaMind also presented a new model called Dynamic Coattention Network(DCN) for the question answering problem, which builds on a pretty intuitive idea.

Imagine I was going to give you a long text and ask you some question. Would you prefer to read the text first and then be asked the question, or be given the question before you actually start reading the text? Naturally, knowing in advance what the question will be conditions you so you know what to pay attention to. If not, you would have to pay equal attention and keep track of every detail and dependencies, to cover for all possible future questions.

DCN does the same thing. First, it generates an internal representation of the documents conditioned on the question that it is trying to answer, and then starts iterating over a list of possible answers converging to the final answer.

Machine Translation

In September, Google presented a new model used by their translation service called Google Neural Machine Translation (GNMT). This model is trained separately for each pair of languages like Chinese-English.

A new GNMT version was announced in November. It goes a step further, training a single model that is able to translate between multiple pairs of languages. The only difference with the previous model is that it now GNMT takes a new input that specifies the target language. It also enables zero-shot translation meaning that it is able to translate a pair of language that it wasn’t trained to.

GNMT results show that training it on multiple pairs of languages is better than training on a single pair, demonstrating that it is able to transfer the “translation knowledge” from one language pair to another.

Community

Several corporations and entrepreneurs have created non-profits and partnerships to discuss about the future of Machine Learning and making sure that these impressive technologies are used properly in favor of the community.

OpenAI is a non-profit organization that aims to collaborate with the research and industry community, and releasing the results to public for free. It was created in late 2015, and started delivering the first results (publications like InfoGAN, platforms like Universe and (un)conferences like this one) in 2016. The motivation behind it is to make sure that AI technology is reachable for as many people as possible, and by doing so, avoiding the creation of AI superpowers.

On the other hand, a partnership on AI was signed by Amazon, DeepMind, Google, Facebook, IBM and Microsoft. The goal is to advance public understanding of the field, support best practices and develop an open platform for discussion and engagement.

Another aspect worth highlighting is the openness of the research community. Not only can you find almost any publication on sites like Arxiv (or Arxiv-Sanity) for free, but you can also now replicate their experiments by using the same code. One useful tool is GitXiv, which links Arxiv papers with their open source project repository.

Open source tools are everywhere (as we highlighted in our 10 main takeaways from MLconf SF blogpost). They are used and created by researchers and companies. Here is a list of the most popular tools in 2016 for Deep Learning:

TensorFlow by Google.
Keras by François Chollet.
CNTK by Microsoft.
MXNET by Distributed (Deep) Machine Learning Community. Adapted by Amazon.
Theano by Université de Montréal.
Torch by Ronan Collobert, Koray Kavukcuoglu, Clement Farabet. Widely used by Facebook.

Final Thoughts

It is a great time to be part of the recent Machine Learning developments. As you can see this year has been particularly exciting; the research is moving at such a rapid pace that it’s hard to keep up with latest advancements. We are truly lucky to be living in an era where AI has been democratized.

At Tryolabs we are working in some very interesting projects with these technologies. We promise to keep you all posted with our findings and continue sharing experiences with the industry and all the interested developers out there.

We reviewed a lot in this post, but there were many other great developments that we had to leave out. If you feel we have not done enough justice to some of these, please feel free to say so in the comments below!

Update (12/07/2016): follow the discussion of this post on HackerNews and/r/MachineLearning. There are a lot of awesome contributions!

Comments powered by Disqus

CODE TIPS, TRICKS, AND FREEBIES. DELIVERED MONTHLY.

Signup to our newsletter.

No spam, ever. We'll never share your email address and you can opt out at any time.

Hire us

ESTIMATED BUDGET

15k - 50k

50k - 75k

75k - 100k

+ 100k

Subscribe to receive news and blog updates.

SUBMIT

ABOUT

About us What we do Our Team
OUR WORK

Clients Products Brochure
COMMUNITY

Blog Open Source Careers
CONTACT

US Phone: (1) 415 429 3887

UY Phone: (598) 2716 8997

hello@tryolabs.com
OFFICES

US: 156 2nd Street, SF.

UY: Rambla Gandhi 655/701, MVD.
SOCIAL

Share to Facebook

, Number of shares

Share to Twitter Share to LinkedIn

, Number of shares150

Share to Reddit

, Number of shares

(转) The major advancements in Deep Learning in 2016的更多相关文章

0.读书笔记之The major advancements in Deep Learning in 2016
The major advancements in Deep Learning in 2016 地址:https://tryolabs.com/blog/2016/12/06/major-advanc ...
[译]2016年深度学习的主要进展（译自：The Major Advancements in Deep Learning in 2016）
译自:The Major Advancements in Deep Learning in 2016 建议阅读时间:10分钟 https://tryolabs.com/blog/2016/12/06/ ...
Deep Learning in a Nutshell: History and Training
Deep Learning in a Nutshell: History and Training This series of blog posts aims to provide an intui ...
DEEP LEARNING WITH STRUCTURE
DEEP LEARNING WITH STRUCTURE Charlie Tang is a PhD student in the Machine Learning group at the Univ ...
A Statistical View of Deep Learning (V): Generalisation and Regularisation
A Statistical View of Deep Learning (V): Generalisation and Regularisation We now routinely build co ...
Deep learning for visual understanding: A review 视觉理解中的深度学习：回顾之一
Deep learning for visual understanding: A review 视觉理解中的深度学习:回顾 ABSTRACT: Deep learning algorithms ar ...
Rolling in the Deep (Learning)
Rolling in the Deep (Learning) Deep Learning has been getting a lot of press lately, and is one of t ...
(转) Awesome - Most Cited Deep Learning Papers
转自:https://github.com/terryum/awesome-deep-learning-papers Awesome - Most Cited Deep Learning Papers ...
学习Data Science/Deep Learning的一些材料
原文发布于我的微信公众号: GeekArtT. 从CFA到如今的Data Science/Deep Learning的学习已经有一年的时间了.期间经历了自我的兴趣.擅长事务的探索和试验,有放弃了的项目 ...

随机推荐

Python开发入门与实战19-Windows Azure web 应用部署
19. 微软云web应用部署上一章节我们介绍了如何实现在微软云通过虚拟机部署我们的在python django应用,本章我们来介绍如何Windows Azure上部署通过部署网站的方式来部署我们的应 ...
centos 6 安装 gitlib
安装gitlab-----------1. 下载 gitlabcurl -O https://downloads-packages.s3.amazonaws.com/centos-6.5/gitlab ...
Jquery DIV滚动至浏览器顶部后固定不动代码
$(function(){ //获取要定位元素距离浏览器顶部的距离 var navH = $(".win").offset().top; //滚动条事件 $(window).scr ...
常用js字符串方法学习总结
2016-06-15 js数组和字符串方法有很多,并且有一部分在使用的过程中有很多方法是很容易被混淆的,今天来总结一下js中数组和字符串的方法. ♦数组(Array)的方法 1.push() 和 po ...
canvas标签(1)--线条、矩形、圆形、文本、阴影、抛小球
从网上扒拉的代码,敲了敲代码玩. html页面显示内容很简单,只要写个canvas标签,给他一个id写js就可以了 <!DOCTYPE html> <html> <hea ...
使用Windows Form 制作一个简易资源管理器
自制一个简易资源管理器----TreeView控件第一步.新建project,进行基本设置:(Set as StartUp Project:View/Toolbox/TreeView) 第二步.开始 ...
Unity在PC上创建Excel文档
NPOI下载连接:http://pan.baidu.com/s/1qWoITRI
iOS socket保持后台连接 ios9.0 xcode8.0
可以保持后台,但申请上架是肯定会被拒的本教程是基于AsyncSocket库的简单开发! socket机制今天就不说了,毕竟百度上太多太详尽了! 1.先new一个工程: 2.要写socket的界面遵 ...
node_modules\typescript\lib 未指向有效的 tsserver 安装将禁用TypeScript 语言功能
Ionic2 项目中经常遇到这个问题每次都找半天无果. 简单记录一下粗暴的解决办法: 卸载ts并从新安装即可 //卸载typescript npm uninstall typescript // ...
UIKit框架之UIEvent
1.继承链:NSObject 2.事件大致可以分为三种事件:触摸事件.动作事件.遥控事件 3.获取事件的touches (1)- (NSSet<UITouch *> *)allTouche ...

(转) The major advancements in Deep Learning in 2016

The major advancements in Deep Learning in 2016

Unsupervised learning

Generative Adversarial Networks (GANs)

The intuitive idea

How a GAN works.

Images generated by a GAN.

InfoGAN

Conditional GANs

Natural Language Processing

Text understanding

Question Answering

Machine Translation

Community

Final Thoughts

WHAT TO READ NEXT

The 10 main takeaways from MLconf SF

Machine Learning 101 Meetups

Tryolabs is Sponsoring MLconf in San Francisco!

CODE TIPS, TRICKS, AND FREEBIES. DELIVERED MONTHLY.

Hire us

ABOUT

OUR WORK

COMMUNITY

CONTACT

OFFICES

SOCIAL

(转) The major advancements in Deep Learning in 2016的更多相关文章

随机推荐

热门专题