Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1412.2306
Main Points:
- An Alignment Model: Convolutional Neural Networks over image regions ( An image -> RCNN -> Top 19 detected locations in addition to the whole image -> the representations based on the pixels Ib inside each bounding box -> a set of h-dimensional vectors {vi | i = 1 ... 20} ), Bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding ( CNN - Structured Objective - BiRNN ).
- A Multimodal Recurrent Neural Network architecture: On the image side, Convolutional Neural Networks ( CNNs ) have recently emerged as a powerful class of models for image classification and object detection. On the sentence side, our work takes advantage of pretrained word vectors to obtain low-dimensional representations of words. Finally, Recurrent Neural Networks have been previously used in language modeling, but we additionally condition these models on images.
- Authors use bidirectional recurrent neural network to compute word representations in the sentence, dispensing of the need to compute dependency trees and allowing unbounded interactions of words and their context in the sentence.


Other Key Points:
- The primary challenge towards generating descriptions of images is in the design of a model that is rich enough to simultaneously reason about contents of images and their representation in the domain of natural language. Additionally, the model should be free of assumptions about specific hard-coded templates, rules or categories and instead rely on learning from the training data. The second, practical challenge is that datasets of image captions are available in large quantities on the internet, but these descriptions multiplex mentions of several entities whose locations in the images are unknown.
Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )的更多相关文章
- Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
- Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
- Deep Visual-Semantic Alignments for Generating Image Descriptions(深度视觉-语义对应对于生成图像描述)
https://cs.stanford.edu/people/karpathy/deepimagesent/ Abstract We present a model that generates na ...
- Paper Reading:Deep Neural Networks for YouTube Recommendations
论文:Deep Neural Networks for YouTube Recommendations 发表时间:2016 发表作者:(Google)Paul Covington, Jay Adams ...
- Paper Reading:Deep Neural Networks for Object Detection
发表时间:2013 发表作者:(Google)Szegedy C, Toshev A, Erhan D 发表刊物/会议:Advances in Neural Information Processin ...
- 论文笔记:Visual Semantic Navigation Using Scene Priors
Visual Semantic Navigation Using Scene Priors 2018-10-21 19:39:26 Paper: https://arxiv.org/pdf/1810 ...
- Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
- 论文笔记:Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association
Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...
- 论文:利用深度强化学习模型定位新物体(VISUAL SEMANTIC NAVIGATION USING SCENE PRIORS)
这是一篇被ICLR 2019 接收的论文.论文讨论了如何利用场景先验知识 (scene priors)来定位一个新场景(novel scene)中未曾见过的物体(unseen objects).举例来 ...
随机推荐
- 转 Grand Central Dispatch 基础教程:Part 1/2 -swift
本文转载,原文地址:http://www.cocoachina.com/ios/20150609/12072.html 原文 Grand Central Dispatch Tutorail for S ...
- Linux---关闭Elasticsearch进程,并重新启动
有时候,当我们启动elasticsearch之后, 经过很长一段时间没有操作, 自己已经忘了是否已经启动了elasticsearch, 这时候我们可以通过下面的方式验证是否启动,并重新启动: step ...
- java ssm 后台框架平台 项目源码 websocket 即时通讯 IM quartz springmvc
官网 http://www.fhadmin.org/D 集成安全权限框架shiro Shiro 是一个用 Java 语言实现的框架,通过一个简单易用的 API 提供身份验证和授权,更安全,更可靠E ...
- PlanetLab介绍
转自http://blog.sina.com.cn/s/blog_83517c050100vyzq.html PlanetLab产生背景 随着计算机技术和通信技术的不断发展,Internet的商业化和 ...
- CSP 试题编号201803-2 Java实现
package HB; import java.util.Scanner; public class Test_06 { public static void main(String[] args) ...
- Xcode 提交APP时遇到 “has one iOS Distribution certificate but its private key is not installed”
解决办法:登录Apple开发证书后台,把发布版证书.cer文件下载到本地,双击安装即可.若还没有设置发布证书文件,则创建一个后下载. Ref: https://blog.csdn.net/dingqk ...
- HTML中放置CSS的三种方式和CSS选择器
(一)在HTML中使用CSS样式的方式一般有三种: 1 内联引用 2 内部引用 3 外部引用. 第一种:内联引用(也叫行内引用) 就是把CSS样式直接作用在HTML标签中. <p style ...
- 【转】Linux常用命令大全(非常全!!!)
最近都在和Linux打交道,感觉还不错.我觉得Linux相比windows比较麻烦的就是很多东西都要用命令来控制,当然,这也是很多人喜欢linux的原因,比较短小但却功能强大.我将我了解到的命令列举一 ...
- ajax与jsonp定义及使用方法
ajax 定义 ajax技术的目的是让javascript发送http请求,与后台通信,获取数据和信息. ajax通信的过程不会影响后续javascript的执行,从而实现异步. 同步和异步 现实生活 ...
- 基于visual studio 2017 以及cubemx 搭建stm32的开发环境(2)
主要解决 vs2017中,printf无法打印数据的问题. 在keil环境下正常使用printf功能,但是以下的重定向代码在vs2017下使用不了: #ifdef __GNUC__ /* With G ...