Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )

Link of the Paper: https://ieeexplore.ieee.org/document/7298856/

A Correlative Paper: Learning a Recurrent Visual Representation for Image Caption Generation (Link of the Paper: https://arxiv.org/abs/1411.5654)

Main Points:

A bi-directional mapping model using recurrent neural networks: unlike previous approaches which map both sentences and images to a common embedding ( and then calculate the similarity and match / generate, I guess ) that may be used for image search or for ranking image captions.
A bi-directional representation: generates both novel descriptions from images and visual representations from descriptions.
A novel recurrent visual memory: automatically learns to remember long-term visual concepts.
A set of latent variables U_t-1 that encodes the visual interpretation of the previously generated or read words W_t-1. Using U, our goal is to compute P(w_t | V, W_t-1, U_t-1) and P(V | W_t-1, U_t-1). Combining these two likelihoods together our global objective is to maximize, P(w_t, V | W_t-1, U_t-1) = P(w_t | V, W_t-1, U_t-1)P(V | W_t-1, U_t-1). That is, we want to maximize the likelihood of the word w_t and the observed visual features V given the previous words and their visual interpretation. Note that in previous papers, the objective was only to compute P(w_t | V, W_t-1) and not P(V | W_t-1).

Other Key Points:

Previous approaches project both semantics and visual features to a common embedding, they are not able to perform the inverse projection. That is, they cannot generate novel sentences or visual depictions from the embedding.

Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )的更多相关文章

Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning
Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...
Paper Reading: In Defense of the Triplet Loss for Person Re-Identification
In Defense of the Triplet Loss for Person Re-Identification 2017-07-02 14:04:20 This blog comes ...
CVPR 2016 paper reading (6)
1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fi ...
论文笔记：Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association
Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...
论文笔记：Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2018-08-10 10:15:06 Pap ...
【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction
Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...

随机推荐

linux下安装swoole扩展
一.下载swoole 地址:https://github.com/swoole/swoole-src二.将下载好的压缩包放在linux服务器下三.解压压缩包到任意目录 # unzip swoole-s ...
蓝桥杯第七届决赛（国赛）C++B组第四题机器人塔
机器人塔 X星球的机器人表演拉拉队有两种服装,A和B.他们这次表演的是搭机器人塔. 类似: A B B A B A A A B B B B B A BA B A B B A 队内的组塔规则 ...
ABAP术语-XML
XML 原文:http://www.cnblogs.com/qiangsheng/archive/2008/03/21/1115743.html The "eXtensible Markup ...
JavaWeb日常笔记
1. XML文档的作用和解析 1. XML的基本概述: XML的主要是用来存储一对多的数据,另外还可以用来当做配置文件存储数据.XML的表头如下: <?xml version='1.0' e ...
python if-elif-else 结构判断输入值处于何种年龄段
输入变量 age 的值,再编写一个 if-elif-else 结构,根据 age的值判断处于人生的哪个阶段.如果一个人的年龄小于 2岁,就打印一条消息,指出他是婴儿.如果一个人的年龄为 2(含)-4岁 ...
Angular 弹窗控件
这个控件个人很喜欢,比起primgNG等弹窗组建,这款弹窗可以很轻松的定义自己的样式和布局. 可控参数有:宽度,高度,是否带有关闭图标,基本满足基础弹窗需求. 并且 Title/Content/Foo ...
帝国CMS给会员注册加入问答验证
修改文件有e/enews/index.php //注册 elseif($enews=="register") { if($_POST['ask']=='帝国软件') { $user ...
与“零值”作比较的 if 语句。
笔试时候遇到的问题,在此做一下记录. 1.if语句中的布尔变量与零值作比较不能用布尔变量与true,false,1,0直接作比较.布尔变量类型的语义是:零值为“假”,任何非零值都表示“真”.因为tr ...
Asp.Net Core存储Cookie不成功
Asp.Net Core存储Cookie不成功 Asp.Net Core2.1生成的项目模板默认实现了<>,所以设置存储Cookie需要做一些处理. 1.第一种是在Startup的Conf ...
20155212 mybash的实现
mybash的实现题目使用fork,exec,wait实现mybash 写出伪代码,产品代码和测试代码发表知识理解,实现过程和问题解决的博客(包含代码托管链接) 准备通过man命令了解fork ...

Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )

Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )的更多相关文章

随机推荐

热门专题