Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )

Link of the Paper: https://arxiv.org/abs/1412.2306

Main Points:

An Alignment Model: Convolutional Neural Networks over image regions ( An image -> RCNN -> Top 19 detected locations in addition to the whole image -> the representations based on the pixels I_b inside each bounding box -> a set of h-dimensional vectors {v_i | i = 1 ... 20} ), Bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding ( CNN - Structured Objective - BiRNN ).
A Multimodal Recurrent Neural Network architecture: On the image side, Convolutional Neural Networks ( CNNs ) have recently emerged as a powerful class of models for image classification and object detection. On the sentence side, our work takes advantage of pretrained word vectors to obtain low-dimensional representations of words. Finally, Recurrent Neural Networks have been previously used in language modeling, but we additionally condition these models on images.
Authors use bidirectional recurrent neural network to compute word representations in the sentence, dispensing of the need to compute dependency trees and allowing unbounded interactions of words and their context in the sentence.

Other Key Points:

The primary challenge towards generating descriptions of images is in the design of a model that is rich enough to simultaneously reason about contents of images and their representation in the domain of natural language. Additionally, the model should be free of assumptions about specific hard-coded templates, rules or categories and instead rely on learning from the training data. The second, practical challenge is that datasets of image captions are available in large quantities on the internet, but these descriptions multiplex mentions of several entities whose locations in the images are unknown.

Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )的更多相关文章

Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
Deep Visual-Semantic Alignments for Generating Image Descriptions（深度视觉-语义对应对于生成图像描述）
https://cs.stanford.edu/people/karpathy/deepimagesent/ Abstract We present a model that generates na ...
Paper Reading:Deep Neural Networks for YouTube Recommendations
论文:Deep Neural Networks for YouTube Recommendations 发表时间:2016 发表作者:(Google)Paul Covington, Jay Adams ...
Paper Reading:Deep Neural Networks for Object Detection
发表时间:2013 发表作者:(Google)Szegedy C, Toshev A, Erhan D 发表刊物/会议:Advances in Neural Information Processin ...
论文笔记：Visual Semantic Navigation Using Scene Priors
Visual Semantic Navigation Using Scene Priors 2018-10-21 19:39:26 Paper: https://arxiv.org/pdf/1810 ...
Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
论文笔记：Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association
Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...
论文：利用深度强化学习模型定位新物体(VISUAL SEMANTIC NAVIGATION USING SCENE PRIORS)
这是一篇被ICLR 2019 接收的论文.论文讨论了如何利用场景先验知识 (scene priors)来定位一个新场景(novel scene)中未曾见过的物体(unseen objects).举例来 ...

随机推荐

flask中的蓝图实现模块化的应用
Blueprint 蓝图的基本设想是当它们注册到应用上时,它们记录将会被执行的操作. 当分派请求和生成从一个端点到另一个的 URL 时,Flask 会关联蓝图中的视图函数. 简单来说,Blueprin ...
MyEclipse报错：com.mysql.jdbc.exceptions.jdbc4.CommunicationsException Communications link failure
数据库服务没有开或者是驱动那块的问题
一图看懂hadoop Yarn工作原理
Hadoop 资源调度框架Yarn运行流程
树莓派3B+学习笔记：7、挂载exfat格式U盘
树莓派的官方系统,默认不支持exfat格式U盘挂载. 插入exfat格式U盘会出现以下错误提示: 安装exfat-fuse后可以正常识别,需要在命令行执行以下命令,按“y”键回车确认: sudo ap ...
第八周课上额外项目：pwd的实现
项目要求: 1 学习pwd命令 2 研究pwd实现需要的系统调用(man -k; grep),写出伪代码 3 实现mypwd 4 测试mypwd 并且上交博客链接. 实验步骤我首先不懂pwd到底是个 ...
20155333 2016-2017-2 《Java程序设计》第一周学习总结
<java程序设计>第一周学习总结学习目标 •了解java基础知识 •了解JVM.JRE与JDK,并下载.安装.测试JDK •了解PATH.CLASSPATH.SOURCEPATH的作用 ...
成都Uber优步司机奖励政策（4月17日）
滴快车单单2.5倍,注册地址:http://www.udache.com/ 如何注册Uber司机(全国版最新最详细注册流程)/月入2万/不用抢单:http://www.cnblogs.com/mfry ...
day6 break continue for
.for .break (整个while循环全部结束) )打印1-100的偶数.py )打印1-100的20个偶数.py )while嵌套中的break (就近原则) .continue 错误用法: ...
【Loj10222】佳佳的Fibonacci
题面题解可以发现\(T(n)\)无法用递推式表示. 于是我们做如下变形: \[ T(n) = \sum _ {i = 1} ^ n i \times f_i \\ S(n) = \sum _ {i ...
SaltStack入门篇（五）之salt-ssh的使用以及LAMP状态设计部署
1.salt-ssh的使用官方文档:https://docs.saltstack.com/en/2016.11/topics/ssh/index.html ()安装salt-ssh [root@li ...

Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )

Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )的更多相关文章

随机推荐

热门专题