Paper Reading - Im2Text: Describing Images Using 1 Million Captioned Photographs ( NIPS 2011 )

Link of the Paper: http://papers.nips.cc/paper/4470-im2text-describing-images-using-1-million-captioned-photographs.pdf

Main Points:

A large novel data set containing images from the web with associated captions written by people, filtered so that the descriptions are likely to refer to visual content.
A description generation method that utilizes global image representations to retrieve and transfer captions from their data set to a query image: authors achieve this by computing the global similarity ( a sum of gist similarity and tiny image color similarity ) of a query image to their large web-collection of captioned images; they find the closest matching image ( or images ) and simply transfer over the description from the matching image to the query image.
A description generation method that utilizes both global representations and direct estimates of image content (objects, actions, stuff, attributes, and scenes) to produce relevant image descriptions.

Other Key Points:

Image captioning will help advance progress toward more complex human recognition goals, such as how to tell the story behind an image.
An approach from Every picture tells a story: generating sentences for images produces image descriptions via a retrieval method, by translating both images and text descriptions to a shared meaning space represented by a single < object, action, scene > tuple. A description for a query image is produced by retrieving whole image descriptions via this meaning space from a set of image descriptions.
The retrieval method relies on collecting and filtering a large data set of images from the internet to produce a novel web-scale captioned photo collection.

Paper Reading - Im2Text: Describing Images Using 1 Million Captioned Photographs ( NIPS 2011 )的更多相关文章

Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
[Paper Reading]--Exploiting Relevance Feedback in Knowledge Graph
<Exploiting Relevance Feedback in Knowledge Graph> Publication: KDD 2015 Authors: Yu Su, Sheng ...
Paper Reading: Perceptual Generative Adversarial Networks for Small Object Detection
Perceptual Generative Adversarial Networks for Small Object Detection 2017-07-11 19:47:46 CVPR 20 ...
Paper Reading: In Defense of the Triplet Loss for Person Re-Identification
In Defense of the Triplet Loss for Person Re-Identification 2017-07-02 14:04:20 This blog comes ...
Paper Reading - Attention Is All You Need ( NIPS 2017 ) ★
Link of the Paper: https://arxiv.org/abs/1706.03762 Motivation: The inherently sequential nature of ...
Paper Reading - Convolutional Sequence to Sequence Learning ( CoRR 2017 ) ★
Link of the Paper: https://arxiv.org/abs/1705.03122 Motivation: Compared to recurrent layers, convol ...
Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1412.2306 Main Points: An Alignment Model: Convolutional Ne ...
Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...

随机推荐

HashMap源码阅读与解析
目录结构导入语 HashMap构造方法 put()方法解析 addEntry()方法解析 get()方法解析 remove()解析 HashMap如何进行遍历一.导入语 HashMap是我们最常见 ...
web前端时间戳转时间类型显示
1.jsp头部加:<%@ taglib prefix="fmt" uri="http://java.sun.com/jsp/jstl/fmt" %> ...
org.yaml.snakeyaml.parser.ParserException: while parsing a block mapping
org.yaml.snakeyaml.parser.ParserException: while parsing a block mapping 原因:yml文件格式错误,此文件要求严格要求格式如节 ...
linux学习笔记一：远程连接linux服务器
环境介绍:win7电脑,通过VM虚拟出linux系统,安装centOS7 通过Xshell连接linux,ftp访问服务器资源. 遇到的问题,ftp连不上linux 解决:linux上安装ftp服务 ...
13JavaScript运算符
运算符 = 用于给 JavaScript 变量赋值. 算术运算符 + 用于把值加起来实例指定变量值,并将值相加: y=5; z=2; x=y+z; 在以上语句执行后,x 的值是:7 1.JavaS ...
【DB2数据库在windows平台上的安装】
php添加数据库转义特殊字符串
addslashes()
c#开发微信公众号——关于c#对象与xml的转换
在成为微信公众号开发者以后,整个交互流程:用户->微信服务器->自己的服务器->返回微信服务器->用户: 举个例子:用户在微信公众号里面发了个“您好!”,微信服务器会以特定的x ...
基于EasyX库的贪吃蛇游戏——C语言实现
接触编程有段时间了,一直想学习怎么去写个游戏来练练手.在看了B站上的教学终于可以自己试试怎么实现贪吃蛇这个游戏了.好了,废话不多说,我们来看看如何用EasyX库来实现贪吃蛇. 一.准备工具vc++6 ...
uva 156 - Ananagrams (反片语)
csdn:https://blog.csdn.net/su_cicada/article/details/86710107 例题5-4 反片语(Ananagrams,Uva 156) 输入一些单词,找 ...

Paper Reading - Im2Text: Describing Images Using 1 Million Captioned Photographs ( NIPS 2011 )

Paper Reading - Im2Text: Describing Images Using 1 Million Captioned Photographs ( NIPS 2011 )的更多相关文章

随机推荐

热门专题