Paper Reading - Im2Text: Describing Images Using 1 Million Captioned Photographs ( NIPS 2011 )

Link of the Paper: http://papers.nips.cc/paper/4470-im2text-describing-images-using-1-million-captioned-photographs.pdf

Main Points:

A large novel data set containing images from the web with associated captions written by people, filtered so that the descriptions are likely to refer to visual content.
A description generation method that utilizes global image representations to retrieve and transfer captions from their data set to a query image: authors achieve this by computing the global similarity ( a sum of gist similarity and tiny image color similarity ) of a query image to their large web-collection of captioned images; they find the closest matching image ( or images ) and simply transfer over the description from the matching image to the query image.
A description generation method that utilizes both global representations and direct estimates of image content (objects, actions, stuff, attributes, and scenes) to produce relevant image descriptions.

Other Key Points:

Image captioning will help advance progress toward more complex human recognition goals, such as how to tell the story behind an image.
An approach from Every picture tells a story: generating sentences for images produces image descriptions via a retrieval method, by translating both images and text descriptions to a shared meaning space represented by a single < object, action, scene > tuple. A description for a query image is produced by retrieving whole image descriptions via this meaning space from a set of image descriptions.
The retrieval method relies on collecting and filtering a large data set of images from the internet to produce a novel web-scale captioned photo collection.

Paper Reading - Im2Text: Describing Images Using 1 Million Captioned Photographs ( NIPS 2011 )的更多相关文章

Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
[Paper Reading]--Exploiting Relevance Feedback in Knowledge Graph
<Exploiting Relevance Feedback in Knowledge Graph> Publication: KDD 2015 Authors: Yu Su, Sheng ...
Paper Reading: Perceptual Generative Adversarial Networks for Small Object Detection
Perceptual Generative Adversarial Networks for Small Object Detection 2017-07-11 19:47:46 CVPR 20 ...
Paper Reading: In Defense of the Triplet Loss for Person Re-Identification
In Defense of the Triplet Loss for Person Re-Identification 2017-07-02 14:04:20 This blog comes ...
Paper Reading - Attention Is All You Need ( NIPS 2017 ) ★
Link of the Paper: https://arxiv.org/abs/1706.03762 Motivation: The inherently sequential nature of ...
Paper Reading - Convolutional Sequence to Sequence Learning ( CoRR 2017 ) ★
Link of the Paper: https://arxiv.org/abs/1705.03122 Motivation: Compared to recurrent layers, convol ...
Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1412.2306 Main Points: An Alignment Model: Convolutional Ne ...
Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...

随机推荐

Android开发之自己定义TabHost文字及背景(源码分享)
使用TabHost 能够在一个屏幕间进行不同版面的切换,而系统自带的tabhost界面较为朴素,我们应该怎样进行自己定义改动优化呢 MainActivity的源码 package com.dream. ...
html input file accept
*.3gpp audio/3gpp, video/3gpp 3GPP Audio/Video*.ac3 audio/ac3 AC3 Audio*.asf allpication/vnd.ms-asf ...
javascript json对象操作（基本增删改查）
/** * Json对象操作,增删改查 * * @author lellansin * @blog www.lellansin.com * @version 0.1 * * 解决一些常见的问题 * g ...
Golddata如何采集需要登录/会话的数据？
概要本文将介绍使用GoldData半自动登录功能,来采集需要登录网站的数据.GoldData半自动登录功能,就是指通过脚本来执行登录,如果需要验证码或者其它内容需要人工输入时,可以通过收发邮件来执行 ...
20155338 2016-2017-2 《Java程序设计》第4周学习总结
20155338 2016-2017-2 <Java程序设计>第4周学习总结教材学习内容总结内容有很多,这里这是选取几个比较重要的一部分来展示. 1.继承 •定义:继承基本上就是避免多 ...
【转载】D3DXVec3TransformNormal and D3DXVec3TransformCoord
原文:D3DXVec3TransformNormal and D3DXVec3TransformCoord D3DXVec3TransformCoord 对向量进行变换,没啥好说明的,默认向量为行向量 ...
【LOJ10121】与众不同
[LOJ10121]与众不同题面 LOJ 题解这题是_\(tham\)给\(ztl\)他们做的,然而这道题™居然还想了蛮久... 首先可以尺取出一个位置\(i\)上一个合法的最远位置\(pre_i ...
2288: 【POJ Challenge】生日礼物
2288: [POJ Challenge]生日礼物 https://lydsy.com/JudgeOnline/problem.php?id=2288 分析: 贪心+堆+链表. 首先把序列变一下,把相 ...
Drupal学习(19) 使用jQuery
本节学习如果在Drupal里交互使用jQuery. jQuery在Drupal是内置支持的.存在根目录的misc目录中. 当调用drupal_add_js方法,会自动加载jQuery. 在Drupal ...
创龙OMAPL138开发板测试（1）
1. 里面的DSP内核是否能单独使用?先测试一个LED灯的例程先,仿真器连接上开发板,显示有C6748和PRU还有ARM9.对了,板子的拨码开关要01111,是DEBUG模式才可以. 2. 下载一下. ...

Paper Reading - Im2Text: Describing Images Using 1 Million Captioned Photographs ( NIPS 2011 )

Paper Reading - Im2Text: Describing Images Using 1 Million Captioned Photographs ( NIPS 2011 )的更多相关文章

随机推荐

热门专题