How Does #DeepDream Work?
How Does #DeepDream Work?
Do neural networks hallucinate of electronic dogs?
If you’ve been browsing the net recently, you might have stumbled on some strange-looking images, with pieces of dog heads, eyes, legs and what looks like buildings, sometimes superimposed on a normal picture, sometimes not. Although they can be nightmare-inducing (or because of that), they have gained a lot of popularity on the Internet. Often tagged #deepdream, they are made by a neural network trained on a huge set of categorized images and set free to generate new ones. The network comes from Google Research and its code is currently available on github, spawning more home-made neural image generators.
It turns out, like many things on the Internet, it has something to do with cats.
In 1971, a British scientist named Sir Colin Blakemore raised a kitten in complete darkness, except for several hours a day in a small cage, where the kitten could only see black and white horizontal stripes. A month or so later, the kitten was introduced to the normal world. It reacted to light, but otherwise seemed to be blind. It didn’t follow moving objects, unless they made a sound. When a recording was taken from the kitten’s visual cortex, it turned out that its neurons reacted to horizontal lines, but not vertical, so the brain was unable to comprehend the complexity of the real world. Another kitten raised in a vertical stripe environment had a similar disability.
The research method is controversial and Sir Colin had his share of threats from angry animal defenders, but the result is interesting. The experiment tells us that the vision system, at least in cats, is something that develops after birth. The visual cortex of a kitten adapts to what the newborn eyes are exposed to and forms neurons that react to dominant basic patterns, like vertical or horizontal lines. From those basic patterns it can then make sense of a more complex image, infer depth and motion. This gives us hope and suggests a method for creating an artificial vision system.
Convolutional neural networks
A feed-forward artificial neural network, when used as a classifier, takes an array of values as input and tries to assign it to a category in response. A relatively small network can be trained to guess a person’s gender, given numeric height and hair length. We collect some sample data, present the samples to the network and gradually modify the network’s weights to minimize the error. The network should then find the general rule and give correct answers to samples it has never seen before. If well trained, it will be correct most of the time and wrong in cases where people would probably be wrong too, given only the same information: height and hair length, and would be much faster too. People are not good with numbers. In turn, estimating gender from a photo is a much easier task for humans, but orders of magnitude harder for computers. An image, represented as a set of numerical values of pixels, is a huge input space, impossible to process in reasonable time. And what if we don’t want to detect genders, but dog breeds, or recognize plants, or find cancer in x-rays? This is where convolutional neural networks come in.
Convolutional neural networks are a breed of neural networks introduced by Kunihiko Fukushima back in 1980, under the name “Neocognitron”. You may stumble across the name “LeNet”, named after Yann LeCun, a researcher working with Facebook.
A kernel convolution is an image operation that, for each pixel, takes the pixels in its square neighborhood, calculates a weighted average of their values, and puts the result as a new value of the pixel. An equally-weighted average will produce a blurry image. Negative weights for nearest neighbors will make the image look sharper. With properly selected weights, we can enhance vertical or horizontal lines, or lines of any angle. With a bigger convolution kernel (neighborhood), we can find curved lines, color gradients or simple patterns. This technique has been known and used in image analysis and manipulation for years. In convolutional neural networks though, the weights in the convolution matrices can be trained using error back-propagation. This way, instead of each pixel being an input, we can have one neuron reacting only to vertical lines, another to horizontal lines, or angles, just like in a cat’s visual cortex.
A layer of a convolutional neural network, consists of a number of such image-transforming neurons, each emphasizing a different aspect of the image. An image then becomes split to a number of feature channels: instead of the initial RGB, we get one channel per kernel. Hardly a solution to the input size problem. We use pooling layers for that. A pooling layer takes non-overlapping square neighborhoods of pixels, finds the highest value and returns that as the value of the neighborhood. Note that, if we did that on the original image, we’d just get a badly pixelated, somewhat brighter miniature. When a pooling layer’s input comes from a convolutional layer, its response means “There is this feature in this area”, which is actually useful information. Pooling layers also make the network less sensitive to where the features are in the image.
“Park or Bird” problem by XKCD
Still, knowing that there are vertical or horizontal lines, gradients or edges, doesn’t help us detect if the photo contains a bird or not (see famous park or bird problemfinally solved by Flickr). Well, we made a step in the right direction, so why not make the next step and stack another set of convolutional neurons, and a pooling layer on top of that, creating a deep learning neural network. Turns out, with enough layers, a network “gets” quite complex features. A face is, after all, a combination of eyes, nose and mouth, with a chance of ears and hair. We can then use these complex features as input for a regular feed-forward neural network and train it to return a category: bird, dog, building, electric guitar, school bus or pagoda.
In 2012, a convolutional neural network was trained on Youtube videos and allowed to freely self-organize categories for what it could see. It self organized a category of cats (see, all goes back to cats again). In 2014, convolutional networks, working together with recurrent neural networks trained on full sentences, learned to describe images in full sentences. Recurrent neural networks are another topic.
Source: Deep Fragment Embeddings for Bidirectional Image-Sentence Mapping by Andrey Karpathy
Dreaming deep
Since 2012, deep learning neural networks started winning image analysis competitions, reaching near-human accuracy in labeling images. Initially, there was some resistance from the computer vision community. One complaint is that the networks are winning, but, well, not showing their work. The leading methods at the time, when detecting a face, for example, would give exact positions of eyes and mouth and return various proportions of the purported face. A deep neural network would just detect faces with uncanny accuracy and not tell us how. While the inner workings of the first convolutional layer were well understood (lines, edges, gradients), it was impossible to look into how the second layer used the information given by the first. The layers above that were a mystery.
Researchers have been successful in fooling these deep neural networks, by generating images with an evolutionary algorithm. Make a random noise image, check the network’s response. If the network thinks there’s a chance of a bus in the image, generate more images like that (offspring) feed those to the network again. The winner is the one that increases the network’s certainty, and gets to contribute to the next generation. What looks like random noise to us, will be interpreted as a bus by the network. If we also optimize the generated images to have statistical properties of real samples, the noise will turn into shapes and textures that we can identify. Such generated images tell us that, for example, stripes are a key feature of bees, that bananas are usually yellow and anemone fish have orange-white-black stripes.
Another approach is feed a real image to the network and then pick a layer, apply the winning transformation to each part of the image, and feed the result back to the network. For the first layer, expectedly, the network will enhance leading line directions in the image, giving it an impressionistic style of wavy brush strokes, dots or swirls. If we do that with a higher layer – reversing the process through lower layers, we get an image painted with textures, like fur, wood, feathers, scales, bricks, grass, waves, spaghetti or meatballs. A yet higher layers will turn any vertical lines into legs and draw eyes and noses on shapes that vaguely look like faces (by the way, people do that too, it’s called pareidolia). Since there are a lot of animal photos in the training image set, the generated images often contain parts of images, sometimes gruesomely distorted. Animal faces are superimposed on human faces, spiky leaves become bird beaks and everything seems a bit hairy. Our brains struggle to make sense of the network’s dreams. The images are disturbing because they remind us of something, but at the same time – not exactly. This makes them interesting, maybe disturbing, and viral on the internet.
What does that say about the history of art, if a lower layer paints a Van Gogh or Seurat picture, while a higher level layer reminds us of Picasso or Dali?
Left: Original photo by Zachi Evenor. Right: processed by Günther Noack, Software Engineer
Going deeper
Things get really interesting if we directly suggest to the network what it should see, by triggering the last, classification layer. Now, I will leave the explanation of how that trigger is passed back through the network to those who have done it. Instead of starting with a random image, it helps to take a real image, blur it a bit, zoom in, and let the network “enhance” it by drawing what it sees. When we repeat the process, we get a potentially infinite zooming sequence of bits and pieces of the network’s sample data.
This is how the Large Scale Deep Neural Network (LSD-NN) is able to hallucinate like that in real-time. It was made by Jonas Degrave and team, known as 317070 on github. Interestingly, this one was built before Google published the deep dream code. 317070’s network hallucinates in a twitch stream, where users can shout categories from Image Net databaseand the network will do its best to produce images that remind it of what the user suggested. The network doesn’t exactly draw the requested objects, but gets the essence. When users ask for volcanoes, there’s smoke and lava. When users shout for pizza, there’s melting cheese. You can see sausages in a butcher shop and spiders on spiderwebs, but most of the time it’s a mesmerizing, colorful soup of textures and shapes. Really. Try it.
It may be called fun or nightmarish, but we learn a lot from making networks dream. We have found the equivalent of the first convolutional layer in cat brains. We have a model which confirms the theory that dreaming helps us remember (networks can be trained on their own generated sample data). We basically simulated pareidolia, perhaps we can infer some information about the mechanism behind schizophrenia.
Like chess and, recently, Jeopardy, machines have crossed yet another threshold and took over something we used to be better at. Remember when captchas were a good way to stop crawlers and bots from stealing your online data? Not anymore. Now it’s just a challenge of the bot’s computational ability. The viral images under the #deepdream hashtag are just a sign that machine vision is becoming mainstream and it’s time to accept that dreaming of electronic sheep (or dogs) is just something androids do.
How Does #DeepDream Work?的更多相关文章
- 4.keras实现-->生成式深度学习之DeepDream
DeepDream是一种艺术性的图像修改技术,它用到了卷积神经网络学到的表示,DeepDream由Google于2015年发布.这个算法与卷积神经网络过滤器可视化技术几乎相同,都是反向运行一个卷积神经 ...
- cs231n---卷积网络可视化,deepdream和风格迁移
本课介绍了近年来人们对理解卷积网络这个“黑盒子”所做的一些可视化工作,以及deepdream和风格迁移. 1 卷积网络可视化 1.1 可视化第一层的滤波器 我们把卷积网络的第一层滤波器权重进行可视化( ...
- 单图像三维重建、2D到3D风格迁移和3D DeepDream
作者:Longway Date:2020-04-25 来源:单图像三维重建.2D到3D风格迁移和3D DeepDream 项目网址:http://hiroharu-kato.com/projects_ ...
- TensorFlow创建DeepDream网络
TensorFlow创建DeepDream网络 Google 于 2014 年在 ImageNet 大型视觉识别竞赛(ILSVRC)训练了一个神经网络,并于 2015 年 7 月开放源代码. 该网络学 ...
- [深度学习大讲堂]从NNVM看2016年深度学习框架发展趋势
本文为微信公众号[深度学习大讲堂]特约稿,转载请注明出处 虚拟框架杀入 从发现问题到解决问题 半年前的这时候,暑假,我在SIAT MMLAB实习. 看着同事一会儿跑Torch,一会儿跑MXNet,一会 ...
- GitHub 上 57 款最流行的开源深度学习项目
转载:https://www.oschina.net/news/79500/57-most-popular-deep-learning-project-at-github GitHub 上 57 款最 ...
- (转) Awesome Deep Learning
Awesome Deep Learning Table of Contents Free Online Books Courses Videos and Lectures Papers Tutori ...
- 除Hadoop大数据技术外,还需了解的九大技术
除Hadoop外的9个大数据技术: 1.Apache Flink 2.Apache Samza 3.Google Cloud Data Flow 4.StreamSets 5.Tensor Flow ...
- 30个深度学习库:按Python、C++、Java、JavaScript、R等10种语言分类
30个深度学习库:按Python.C++.Java.JavaScript.R等10种语言分类 包括 Python.C++.Java.JavaScript.R.Haskell等在内的一系列编程语言的深度 ...
随机推荐
- 关于Java基本数据类型
Java的基本数据类型分为两大类:boolean类型和数字类型.而数值类型又分为整数类型和浮点类型.而整数类型中的字符类型也可以被单独对待. 总共是4类8种. byte类型整数在内存里占8位. -12 ...
- 外部div自适应内部标签的高度,设置最小高度、最大高度
一.div自适应高度:在前端开发中经常需要让外层的div自动适应内部标签和内容的高度,内部标签可能是<div>.<ul>.<ol>或者文字等各种内容.造成外层的di ...
- JAXB - XML Schema Types, Defining Types for XML Elements With Content
Content: A Value The content of an XML element may be some value, or one or more subordinate element ...
- sparkSQL1.1入门
http://blog.csdn.net/book_mmicky/article/details/39288715 2014年9月11日,Spark1.1.0忽然之间发布.笔者立即下载.编译.部署了S ...
- C#学习笔记4:关键词大小写、复合格式化等
1.取消选择的单选按钮值 RadioButton1.Checked=RadioButton2.Checked = false;//RadioButton1.Text = RadioButto ...
- Ubuntu 16.04 - 64bit 解压 rar 报错 Parsing Filters not supported
Ubuntu 16.04 - 64bit 解压rar 文件报错: 错误如下图: 原因: 未安装解压命令 unrar 参考博客: Error - "Parsing Filters not s ...
- cookie管理中的一些细节,转的
1.domain表示的是cookie所在的域,默认为请求的地址,如网址为www.jb51.net/test/test.aspx,那么domain默认为www.jb51.net.而跨域访问,如域A为t1 ...
- 获取文件sha1 值
单元 IdHashSHA申明 function GetFile_SHA1(const iFileName: String): String; //Checksum hash value for fir ...
- 解决DropDownList 有一个无效 SelectedValue,因为它不在项目列表中。这是怎么回事?
产生错误原因: 绑定在DropDownList的时候 DropDownList没有对应的值 查了一下MSDN:DropDownList.SelectedValue 属性: 此属性返回选定的 ListI ...
- iOS 键盘回收实现步骤
第一步:遵守协议 (UITextFieldDelegate) @interface AppDelegate : UIResponder <UIApplicationDelegate,UIText ...