Link of the Paper: https://ieeexplore.ieee.org/document/7298856/

A Correlative Paper: Learning a Recurrent Visual Representation for Image Caption Generation (Link of the Paper: https://arxiv.org/abs/1411.5654)

Main Points:

  1. A bi-directional mapping model using recurrent neural networks: unlike previous approaches which map both sentences and images to a common embedding ( and then calculate the similarity and match / generate, I guess ) that may be used for image search or for ranking image captions.
  2. A bi-directional representation: generates both novel descriptions from images and visual representations from descriptions.
  3. A novel recurrent visual memory: automatically learns to remember long-term visual concepts.
  4. A set of latent variables Ut-1 that encodes the visual interpretation of the previously generated or read words Wt-1. Using U, our goal is to compute P(wt | V, Wt-1, Ut-1) and P(V | Wt-1, Ut-1). Combining these two likelihoods together our global objective is to maximize, P(wt, V | Wt-1, Ut-1) = P(wt | V, Wt-1, Ut-1)P(V | Wt-1, Ut-1). That is, we want to maximize the likelihood of the word wt and the observed visual features V given the previous words and their visual interpretation. Note that in previous papers, the objective was only to compute P(wt | V, Wt-1) and not P(V | Wt-1).

Other Key Points:

  1. Previous approaches project both semantics and visual features to a common embedding, they are not able to perform the inverse projection. That is, they cannot generate novel sentences or visual depictions from the embedding.

Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )的更多相关文章

  1. Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★

    Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...

  2. Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )

    Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...

  3. Paper Reading: Stereo DSO

    开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...

  4. Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning

    Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...

  5. Paper Reading: In Defense of the Triplet Loss for Person Re-Identification

    In Defense of the Triplet Loss for Person Re-Identification  2017-07-02  14:04:20   This blog comes ...

  6. CVPR 2016 paper reading (6)

    1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fi ...

  7. 论文笔记:Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association

    Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...

  8. 论文笔记:Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

    Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2018-08-10 10:15:06 Pap ...

  9. 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction

    Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...

随机推荐

  1. linux下安装swoole扩展

    一.下载swoole 地址:https://github.com/swoole/swoole-src二.将下载好的压缩包放在linux服务器下三.解压压缩包到任意目录 # unzip swoole-s ...

  2. 蓝桥杯第七届决赛(国赛)C++B组 第四题 机器人塔

    机器人塔 X星球的机器人表演拉拉队有两种服装,A和B.他们这次表演的是搭机器人塔. 类似: A    B B   A B A  A A B B B B B A BA B A B B A 队内的组塔规则 ...

  3. ABAP术语-XML

    XML 原文:http://www.cnblogs.com/qiangsheng/archive/2008/03/21/1115743.html The "eXtensible Markup ...

  4. JavaWeb日常笔记

    1.   XML文档的作用和解析 1. XML的基本概述: XML的主要是用来存储一对多的数据,另外还可以用来当做配置文件存储数据.XML的表头如下: <?xml version='1.0' e ...

  5. python if-elif-else 结构判断输入值处于何种年龄段

    输入变量 age 的值,再编写一个 if-elif-else 结构,根据 age的值判断处于人生的哪个阶段.如果一个人的年龄小于 2岁,就打印一条消息,指出他是婴儿.如果一个人的年龄为 2(含)-4岁 ...

  6. Angular 弹窗 控件

    这个控件个人很喜欢,比起primgNG等弹窗组建,这款弹窗可以很轻松的定义自己的样式和布局. 可控参数有:宽度,高度,是否带有关闭图标,基本满足基础弹窗需求. 并且 Title/Content/Foo ...

  7. 帝国CMS给会员注册加入问答验证

    修改文件有e/enews/index.php //注册 elseif($enews=="register") { if($_POST['ask']=='帝国软件') { $user ...

  8. 与“零值”作比较的 if 语句。

    笔试时候遇到的问题,在此做一下记录. 1.if语句中的布尔变量与零值作比较 不能用布尔变量与true,false,1,0直接作比较.布尔变量类型的语义是:零值为“假”,任何非零值都表示“真”.因为tr ...

  9. Asp.Net Core存储Cookie不成功

    Asp.Net Core存储Cookie不成功 Asp.Net Core2.1生成的项目模板默认实现了<>,所以设置存储Cookie需要做一些处理. 1.第一种是在Startup的Conf ...

  10. 20155212 mybash的实现

    mybash的实现 题目 使用fork,exec,wait实现mybash 写出伪代码,产品代码和测试代码 发表知识理解,实现过程和问题解决的博客(包含代码托管链接) 准备 通过man命令了解fork ...