Link of the Paper: https://arxiv.org/abs/1412.2306

Main Points:

  1. An Alignment Model: Convolutional Neural Networks over image regions ( An image -> RCNN -> Top 19 detected locations in addition to the whole image -> the representations based on the pixels Ib inside each bounding box -> a set of h-dimensional vectors {vi | i = 1 ... 20} ), Bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding ( CNN - Structured Objective - BiRNN ).
  2. A Multimodal Recurrent Neural Network architecture: On the image side, Convolutional Neural Networks ( CNNs ) have recently emerged as a powerful class of models for image classification and object detection. On the sentence side, our work takes advantage of pretrained word vectors to obtain low-dimensional representations of words. Finally, Recurrent Neural Networks have been previously used in language modeling, but we additionally condition these models on images.
  3. Authors use bidirectional recurrent neural network to compute word representations in the sentence, dispensing of the need to compute dependency trees and allowing unbounded interactions of words and their context in the sentence.

Other Key Points:

  1. The primary challenge towards generating descriptions of images is in the design of a model that is rich enough to simultaneously reason about contents of images and their representation in the domain of natural language. Additionally, the model should be free of assumptions about specific hard-coded templates, rules or categories and instead rely on learning from the training data. The second, practical challenge is that datasets of image captions are available in large quantities on the internet, but these descriptions multiplex mentions of several entities whose locations in the images are unknown.

Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )的更多相关文章

  1. Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★

    Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...

  2. Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )

    Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...

  3. Deep Visual-Semantic Alignments for Generating Image Descriptions(深度视觉-语义对应对于生成图像描述)

    https://cs.stanford.edu/people/karpathy/deepimagesent/ Abstract We present a model that generates na ...

  4. Paper Reading:Deep Neural Networks for YouTube Recommendations

    论文:Deep Neural Networks for YouTube Recommendations 发表时间:2016 发表作者:(Google)Paul Covington, Jay Adams ...

  5. Paper Reading:Deep Neural Networks for Object Detection

    发表时间:2013 发表作者:(Google)Szegedy C, Toshev A, Erhan D 发表刊物/会议:Advances in Neural Information Processin ...

  6. 论文笔记:Visual Semantic Navigation Using Scene Priors

    Visual Semantic Navigation Using Scene Priors 2018-10-21 19:39:26 Paper:  https://arxiv.org/pdf/1810 ...

  7. Paper Reading: Stereo DSO

    开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...

  8. 论文笔记:Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association

    Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...

  9. 论文:利用深度强化学习模型定位新物体(VISUAL SEMANTIC NAVIGATION USING SCENE PRIORS)

    这是一篇被ICLR 2019 接收的论文.论文讨论了如何利用场景先验知识 (scene priors)来定位一个新场景(novel scene)中未曾见过的物体(unseen objects).举例来 ...

随机推荐

  1. 开源Webshell利用工具——Altman

    开源Webshell利用工具--Altman keepwn @ 工具 2014-06-04 共 6114 人围观,发现 43 个不明物体收藏该文 Altman,the webshell tool,自己 ...

  2. java 网站源码 六套模版 兼容手机平板PC freemaker 静态引擎 在线编辑模版

    官网 http://www.fhadmin.org/ 系统介绍: 1.网站后台采用主流的 SSM 框架 jsp JSTL,网站后台采用freemaker静态化模版引擎生成html 2.因为是生成的ht ...

  3. 简单的 Android 菜单

    Android 创建简单的菜单 一:上下文菜单: 1.在 res 下创建菜单项资源文夹 menu app->右击res->new->android resourse director ...

  4. execute immediate

    首先在这里发发牢骚,指责下那些刻板的书写方式,不考虑读者理不理解,感觉就是给专业人员用来复习用的一样,没有前戏,直接就高潮,实在受不了!没基础或基础差的完全不知道发生了什么,一脸懵逼的看着,一星差评! ...

  5. SSM(SpringMVC+Spring+Mybatis)框架学习理解

    近期做到的项目中,用到的框架是SSM(SpringMVC+Spring+Mybatis).之前比较常见的是SSH.用到了自然得了解各部分的分工 spring mvc 是spring 处理web层请求的 ...

  6. 树莓派3B+SimpleCV上连接iPhone4s摄像头

    目的:把iPhone4s当成网络摄像头,通过wifi连接到树莓派上,做为树莓派的摄像头. 1. iPhone4s上安装mini WebCam应用. 很旧的一个app, 没有密码,简单,无广告,免费. ...

  7. python 爬虫基础知识(继续补充)

    学了这么久爬虫,今天整理一下相关知识点,还会继续更新 HTTP和HTTPS HTTP协议(HyperText Transfer Protocol,超文本传输协议):是一种发布和接收 HTML页面的方法 ...

  8. 20155222 2016-2017-2《Java程序设计》课程总结

    20155222 2016-2017-2<Java程序设计>课程总结 每周作业链接汇总 预备作业1:期望的师生关系 预备作业2:技能获取与语言学习 预备作业3:安装虚拟机及学习linux系 ...

  9. 2017-2018-1 《信息安全技术》实验二——Windows口令破解

    2017-2018-1 <信息安全技术>实验二--Windows口令破解 所用工具 系统:能勾起我回忆的Windows 2003 工具:LC5.SuperDic Windows口令破解 口 ...

  10. 20155232 2016-2017-3 《Java程序设计》第4周学习总结

    20155232 2016-2017-3 <Java程序设计>第4周学习总结 教材学习内容总结 第六章 继承与多态 所谓继承就是避免多个类间重复定义共同行为. 1.重复在程序设计上就是不好 ...