Link of the Paper: https://arxiv.org/abs/1412.2306

Main Points:

  1. An Alignment Model: Convolutional Neural Networks over image regions ( An image -> RCNN -> Top 19 detected locations in addition to the whole image -> the representations based on the pixels Ib inside each bounding box -> a set of h-dimensional vectors {vi | i = 1 ... 20} ), Bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding ( CNN - Structured Objective - BiRNN ).
  2. A Multimodal Recurrent Neural Network architecture: On the image side, Convolutional Neural Networks ( CNNs ) have recently emerged as a powerful class of models for image classification and object detection. On the sentence side, our work takes advantage of pretrained word vectors to obtain low-dimensional representations of words. Finally, Recurrent Neural Networks have been previously used in language modeling, but we additionally condition these models on images.
  3. Authors use bidirectional recurrent neural network to compute word representations in the sentence, dispensing of the need to compute dependency trees and allowing unbounded interactions of words and their context in the sentence.

Other Key Points:

  1. The primary challenge towards generating descriptions of images is in the design of a model that is rich enough to simultaneously reason about contents of images and their representation in the domain of natural language. Additionally, the model should be free of assumptions about specific hard-coded templates, rules or categories and instead rely on learning from the training data. The second, practical challenge is that datasets of image captions are available in large quantities on the internet, but these descriptions multiplex mentions of several entities whose locations in the images are unknown.

Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )的更多相关文章

  1. Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★

    Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...

  2. Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )

    Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...

  3. Deep Visual-Semantic Alignments for Generating Image Descriptions(深度视觉-语义对应对于生成图像描述)

    https://cs.stanford.edu/people/karpathy/deepimagesent/ Abstract We present a model that generates na ...

  4. Paper Reading:Deep Neural Networks for YouTube Recommendations

    论文:Deep Neural Networks for YouTube Recommendations 发表时间:2016 发表作者:(Google)Paul Covington, Jay Adams ...

  5. Paper Reading:Deep Neural Networks for Object Detection

    发表时间:2013 发表作者:(Google)Szegedy C, Toshev A, Erhan D 发表刊物/会议:Advances in Neural Information Processin ...

  6. 论文笔记:Visual Semantic Navigation Using Scene Priors

    Visual Semantic Navigation Using Scene Priors 2018-10-21 19:39:26 Paper:  https://arxiv.org/pdf/1810 ...

  7. Paper Reading: Stereo DSO

    开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...

  8. 论文笔记:Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association

    Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...

  9. 论文:利用深度强化学习模型定位新物体(VISUAL SEMANTIC NAVIGATION USING SCENE PRIORS)

    这是一篇被ICLR 2019 接收的论文.论文讨论了如何利用场景先验知识 (scene priors)来定位一个新场景(novel scene)中未曾见过的物体(unseen objects).举例来 ...

随机推荐

  1. 算法学习记录-查找——平衡二叉树(AVL)

    排序二叉树对于我们寻找无序序列中的元素的效率有了大大的提高.查找的最差情况是树的高度.这里就有问题了,将无序数列转化为 二叉排序树的时候,树的结构是非常依赖无序序列的顺序,这样会出现极端的情况. [如 ...

  2. jdk 下载地址 服务器

    https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

  3. GCD 多线程技术

    Grand Central Dispatch(GCD)是异步执行任务的技术之一.一般将应用程序中记述的线程管理用 的代码在系统级中实现.开发者只需要定义想执行的任务并追加到适当的Dispatch Qu ...

  4. koa2学习笔记03 - 给koa2配置session ——koa2结构分层、配置数据库、接口

    前言 这一章写的很没有底气,因为我完全不懂一个正经的后台应用是怎么结构分层的, 所有只能按照我自己的理解去写,即使这样也仅仅只分离出了controller层, 至于所谓的service层,dao层,完 ...

  5. 前端使用mobx时,变量已经修改了,为什么组件还是没变化,map类型变量,对象类型变量的值获取问题(主要矛盾发生在组件使用时)

    前天我在使用一个前端多选框组件时遇到了一个问题,明明对象内的值已经修改了,但是组件显示的还是没有效果改变,以下是当时打出的log,我打印了这个对象的信息 对象内的值已经修改了但是组件还是不能及时更改, ...

  6. Redis(一):NoSQL入门和概述

    NoSQL入门和概述目录导航: NoSQL入门概述 3V+3高 当下的NoSQL经典应用 NoSQL数据模型简介 NoSQL数据库的四大分类 在分布式数据库中CAP原理CAP+BASE NoSQL 入 ...

  7. Redis推荐阅读笔记整理

    Herrt灬凌夜    https://www.cnblogs.com/wuyx/archive/2018/03.html 6. Redis_常用5大数据类型简介 5. redis_安装 4. Red ...

  8. zookeeper无法启动"Unable to load database on disk

    QuorumPeerMain,ResourceManager都没有起来 resourcemanager.log如下 2018-09-28 23:17:02,787 FATAL org.apache.h ...

  9. Debian Linux 下安装pip3

    由于Debian自带了python3,于是只需要安装pip 1.首先安装setuptools 下载wget --no-check-certificate https://pypi.python.org ...

  10. EJS 模板中,js 如何获取后端传来的数据

    在 ejs 模板中,想让 js 的代码获得后端传来的数据,要在<%=%>的外面加一对引号. 如下图,从后端给 chatroom.ejs 传进去一个 avatar 变量,是个字符串类型的. ...