Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1412.2306
Main Points:
- An Alignment Model: Convolutional Neural Networks over image regions ( An image -> RCNN -> Top 19 detected locations in addition to the whole image -> the representations based on the pixels Ib inside each bounding box -> a set of h-dimensional vectors {vi | i = 1 ... 20} ), Bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding ( CNN - Structured Objective - BiRNN ).
- A Multimodal Recurrent Neural Network architecture: On the image side, Convolutional Neural Networks ( CNNs ) have recently emerged as a powerful class of models for image classification and object detection. On the sentence side, our work takes advantage of pretrained word vectors to obtain low-dimensional representations of words. Finally, Recurrent Neural Networks have been previously used in language modeling, but we additionally condition these models on images.
- Authors use bidirectional recurrent neural network to compute word representations in the sentence, dispensing of the need to compute dependency trees and allowing unbounded interactions of words and their context in the sentence.
Other Key Points:
- The primary challenge towards generating descriptions of images is in the design of a model that is rich enough to simultaneously reason about contents of images and their representation in the domain of natural language. Additionally, the model should be free of assumptions about specific hard-coded templates, rules or categories and instead rely on learning from the training data. The second, practical challenge is that datasets of image captions are available in large quantities on the internet, but these descriptions multiplex mentions of several entities whose locations in the images are unknown.
Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )的更多相关文章
- Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
- Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
- Deep Visual-Semantic Alignments for Generating Image Descriptions(深度视觉-语义对应对于生成图像描述)
https://cs.stanford.edu/people/karpathy/deepimagesent/ Abstract We present a model that generates na ...
- Paper Reading:Deep Neural Networks for YouTube Recommendations
论文:Deep Neural Networks for YouTube Recommendations 发表时间:2016 发表作者:(Google)Paul Covington, Jay Adams ...
- Paper Reading:Deep Neural Networks for Object Detection
发表时间:2013 发表作者:(Google)Szegedy C, Toshev A, Erhan D 发表刊物/会议:Advances in Neural Information Processin ...
- 论文笔记:Visual Semantic Navigation Using Scene Priors
Visual Semantic Navigation Using Scene Priors 2018-10-21 19:39:26 Paper: https://arxiv.org/pdf/1810 ...
- Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
- 论文笔记:Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association
Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...
- 论文:利用深度强化学习模型定位新物体(VISUAL SEMANTIC NAVIGATION USING SCENE PRIORS)
这是一篇被ICLR 2019 接收的论文.论文讨论了如何利用场景先验知识 (scene priors)来定位一个新场景(novel scene)中未曾见过的物体(unseen objects).举例来 ...
随机推荐
- maven多模块部署(转载)
[Maven]使用Maven构建多模块项目 Maven多模块项目 Maven多模块项目,适用于一些比较大的项目,通过合理的模块拆分,实现代码的复用,便于维护和管理.尤其是一些开源框架,也是采用多模块的 ...
- Eclipse中按CTRL键点击类不能进入
是因为Eclipse或项目没有关联jdk,首先看window->preferences->java->Installed JREs,看是不是关联的你所安装的jdk,有的是关联的JRE ...
- 线程队列-queue
使用队列的目的: 解耦,使程序之间实现松耦合:提高处理效率 FIFO = 先进先出,first in first out LIFO = 后入先出,last in first out 生产者消费 ...
- C++编译器是如何管理类和对象的,类的成员函数和成员变量
C++中的class从面向对象理论出发,将变量(属性)和函数(方法)集中定义在一起,用于描述现实世界中的类.从计算机的角度,程序依然由数据段(栈区内存)和代码段(代码区内存)构成. #include ...
- ABAP术语-Update Data
Update Data 原文:http://www.cnblogs.com/qiangsheng/archive/2008/03/20/1114169.html The data which is t ...
- 请问在一个命令上加什么参数可以实现下面命令的内容在同一行输出。 echo "zhaokang";echo "zhaokang"
请问在一个命令上加什么参数可以实现下面命令的内容在同一行输出. echo "zhaokang";echo "zhaokang" [root@zhaokang t ...
- js数组的处理使用
var users = [ {name: "张含韵", "email": "zhang@email.com"}, {name: " ...
- [转]windows下多个python版本共存,pip使用
windows下多个python版本共存,pip使用 2017年09月13日 17:21:30 阅读数:2574 一.同时装了Python3和Python2,怎么区分 了解python的人都知道pyt ...
- Delphi Android下包含第三方DEX
1.将jar转换为dex call dx --dex -verbose --output=.\output\dex\test_classes.dex --positions=lines .\outpu ...
- IOTutility 一个轻量级的 IOT 基础操作库
IOTutility 一个轻量级的 IOT 基础操作库 Base utility for IOT devices, networking, controls etc... IOTutility 的目的 ...