Link of the Paper: http://papers.nips.cc/paper/4470-im2text-describing-images-using-1-million-captioned-photographs.pdf

Main Points:

  1. A large novel data set containing images from the web with associated captions written by people, filtered so that the descriptions are likely to refer to visual content.
  2. A description generation method that utilizes global image representations to retrieve and transfer captions from their data set to a query image: authors achieve this by computing the global similarity ( a sum of gist similarity and tiny image color similarity ) of a query image to their large web-collection of captioned images; they find the closest matching image ( or images ) and simply transfer over the description from the matching image to the query image.
  3. A description generation method that utilizes both global representations and direct estimates of image content (objects, actions, stuff, attributes, and scenes) to produce relevant image descriptions.

Other Key Points:

  1. Image captioning will help advance progress toward more complex human recognition goals, such as how to tell the story behind an image.
  2. An approach from Every picture tells a story: generating sentences for images produces image descriptions via a retrieval method, by translating both images and text descriptions to a shared meaning space represented by a single < object, action, scene > tuple. A description for a query image is produced by retrieving whole image descriptions via this meaning space from a set of image descriptions.
  3. The retrieval method relies on collecting and filtering a large data set of images from the internet to produce a novel web-scale captioned photo collection.

Paper Reading - Im2Text: Describing Images Using 1 Million Captioned Photographs ( NIPS 2011 )的更多相关文章

  1. Paper Reading: Stereo DSO

    开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...

  2. Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★

    Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...

  3. [Paper Reading]--Exploiting Relevance Feedback in Knowledge Graph

    <Exploiting Relevance Feedback in Knowledge Graph> Publication: KDD 2015 Authors: Yu Su, Sheng ...

  4. Paper Reading: Perceptual Generative Adversarial Networks for Small Object Detection

    Perceptual Generative Adversarial Networks for Small Object Detection 2017-07-11  19:47:46   CVPR 20 ...

  5. Paper Reading: In Defense of the Triplet Loss for Person Re-Identification

    In Defense of the Triplet Loss for Person Re-Identification  2017-07-02  14:04:20   This blog comes ...

  6. Paper Reading - Attention Is All You Need ( NIPS 2017 ) ★

    Link of the Paper: https://arxiv.org/abs/1706.03762 Motivation: The inherently sequential nature of ...

  7. Paper Reading - Convolutional Sequence to Sequence Learning ( CoRR 2017 ) ★

    Link of the Paper: https://arxiv.org/abs/1705.03122 Motivation: Compared to recurrent layers, convol ...

  8. Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )

    Link of the Paper: https://arxiv.org/abs/1412.2306 Main Points: An Alignment Model: Convolutional Ne ...

  9. Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )

    Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...

随机推荐

  1. SQL Server 数据收缩

    1. 数据库的相关属性 在MS中创建数据库时会为数据库分配初始的大小(如下图:数据库和日志两个文件),随着数据库的使用文件会逐渐增大.数据库文件大小的增加有两种方式: 自动增长:在自动增长中可以设置每 ...

  2. SpringBoot 动态打包

    配置pom.xml <?xml version="1.0" encoding="UTF-8"?> <project xmlns="h ...

  3. JOB SERVER 负载均衡

    JOB SERVER 负载均衡 一.体系结构 1.job server group job server group 是由一个或者多个job server 组成的,做为一个整体对外提供服务,在内部实现 ...

  4. 随机获取UDID

    (NSString *)uuidString { CFUUIDRef uuid_ref = CFUUIDCreate(NULL); CFStringRef uuid_string_ref= CFUUI ...

  5. Oracle 执行计划的查看方式

    访问数据的方法:一.访问表的方法:1.全表扫描,2.ROWID扫描                                二.访问索引的方法:1.索引唯一性扫描,2.索引范围扫描,3.索引全扫 ...

  6. 建立复数类Complex,并且进行赋值,求和,取模等操作

    #include "pch.h" #include <iostream> #include<cmath> using namespace std; clas ...

  7. 基于 HTML5 WebGL 智能城市的模拟运行

    前言 智能城市是一个系统.也称为网络城市.数字化城市.信息城市. 智能城市建设是一个系统工程:首先实现的是城市管理智能化,由智能城市管理系统辅助管理城市,通过管理系统人们可以监视城市的运行,了解城市每 ...

  8. mongodb学习一(使用mongoResposity)

    最近公司做一个项目用到了mongodb,下面来介绍一下MongoRepository接口. 大家可以类比Hibernate的jpa,MongoRepository是一个springdata提供的一个有 ...

  9. CRLF注入学习

    预备 <CRLF>是换行符,CRLF注入顾名思义就是把换行符写入,那么要把换行符写入到哪里呢?看看下面的http头 可以看到,每一行都包含特定的头部信息,然后以换行为标志写入其他的头部信息 ...

  10. 面试和工作中的map

    map是C++ STL中的关联容器,存储的是键值对(key-value),可以通过key快速索引到value.map容器中的数据是自动排序的,其排序方式是严格的弱排序(stick weak order ...