Paper Reading - Im2Text: Describing Images Using 1 Million Captioned Photographs ( NIPS 2011 )
Link of the Paper: http://papers.nips.cc/paper/4470-im2text-describing-images-using-1-million-captioned-photographs.pdf
Main Points:
- A large novel data set containing images from the web with associated captions written by people, filtered so that the descriptions are likely to refer to visual content.
- A description generation method that utilizes global image representations to retrieve and transfer captions from their data set to a query image: authors achieve this by computing the global similarity ( a sum of gist similarity and tiny image color similarity ) of a query image to their large web-collection of captioned images; they find the closest matching image ( or images ) and simply transfer over the description from the matching image to the query image.
- A description generation method that utilizes both global representations and direct estimates of image content (objects, actions, stuff, attributes, and scenes) to produce relevant image descriptions.

Other Key Points:
- Image captioning will help advance progress toward more complex human recognition goals, such as how to tell the story behind an image.
- An approach from Every picture tells a story: generating sentences for images produces image descriptions via a retrieval method, by translating both images and text descriptions to a shared meaning space represented by a single < object, action, scene > tuple. A description for a query image is produced by retrieving whole image descriptions via this meaning space from a set of image descriptions.
- The retrieval method relies on collecting and filtering a large data set of images from the internet to produce a novel web-scale captioned photo collection.
Paper Reading - Im2Text: Describing Images Using 1 Million Captioned Photographs ( NIPS 2011 )的更多相关文章
- Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
- Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
- [Paper Reading]--Exploiting Relevance Feedback in Knowledge Graph
<Exploiting Relevance Feedback in Knowledge Graph> Publication: KDD 2015 Authors: Yu Su, Sheng ...
- Paper Reading: Perceptual Generative Adversarial Networks for Small Object Detection
Perceptual Generative Adversarial Networks for Small Object Detection 2017-07-11 19:47:46 CVPR 20 ...
- Paper Reading: In Defense of the Triplet Loss for Person Re-Identification
In Defense of the Triplet Loss for Person Re-Identification 2017-07-02 14:04:20 This blog comes ...
- Paper Reading - Attention Is All You Need ( NIPS 2017 ) ★
Link of the Paper: https://arxiv.org/abs/1706.03762 Motivation: The inherently sequential nature of ...
- Paper Reading - Convolutional Sequence to Sequence Learning ( CoRR 2017 ) ★
Link of the Paper: https://arxiv.org/abs/1705.03122 Motivation: Compared to recurrent layers, convol ...
- Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1412.2306 Main Points: An Alignment Model: Convolutional Ne ...
- Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...
随机推荐
- Beyond Compare 命令行生成目录下所有文件比对的Html网页report
MAC环境下,使用Beyond Compare命令行生成两个文件夹差异的html,按目录递归生成. #1. 创建compare #2. 创建compare/old #3. compare/new #4 ...
- 容易忽略的expect脚本问题,暗藏的僵尸进程,wait命令不要漏掉
问题描述 前几天有个小需求,用到expect脚本去循环的发送一些数据,主要问题代码如下: #! /usr/bin/expect while {true} { set timeout 60 spawn ...
- Token ,Cookie和Session的区别
在做接口测试时,经常会碰到请求参数为token的类型,但是可能大部分测试人员对token,cookie,session的区别还是一知半解. Cookie cookie 是一个非常具体的东西,指的就是浏 ...
- Kafka 部署指南-好久没有更新博客了
最近到了一家新公司,很多全新技术栈要理解.每天都在看各类 English Offcial Document,我的宗旨是我既然看懂了,就写下来分享,这是第一篇. 基本需求: 1.已有 zookeeper ...
- Extjs 中callParent的作用
callParent 是 Sencha 类系统提供的一个用于调用你父/祖先类中的方法. 这个通常用于当你 继承一个框架类 或者 覆写一个类中提供的方法(比如 onRender) 时. 当你在一个带参数 ...
- 前端提交表单两种验证方式记录 jq或h5 required属性
JQuery: <form id="form"> <input type="text" name="aaa"> &l ...
- Redis的特性以及优势(附官网)
NoSQL:一类新出现的数据库(not only sql) 泛指非关系型的数据库 不支持SQL语法 存储结构跟传统关系型数据库中的那种关系表完全不同,nosql中存储的数据都是KV形式 NoSQL的世 ...
- 使用jdk生成自签发证书(过程总结)
前言: 最近在做华为NB-IoT接口开发,需要用到双向认证,就去学了一下. 然后我将过程总结了一下. 相关华为论坛链接:http://developer.huawei.com/ict/forum/th ...
- pow函数(数学次方)在c语言的用法,两种编写方法实例( 计算1/1-1/2+1/3-1/4+1/5 …… + 1/99 - 1/100 的值)
关于c语言里面pow函数,下面借鉴了某位博主的一篇文章: 头文件:#include <math.h> pow() 函数用来求 x 的 y 次幂(次方),x.y及函数值都是double型 , ...
- IDL返回众数(数组中出现次数最多的值)
对于整型数组,可以直接利用histogram函数可以实现,示例如下: IDL>array = [1, 1, 2 , 4, 1, 3, 3, 2, 4, 5, 3, 2, 2, 1, 2, 6, ...