Paper Reading - Learning to Evaluate Image Captioning ( CVPR 2018 ) ★
Link of the Paper: https://arxiv.org/abs/1806.06422
Innovations:
- The authors propose a novel learning based discriminative evaluation metric that is directly trained to distinguish between human and machine-generated captions. They train an automatic critique to distinguish generated captions from human-written ones, and then score candidate captions by how successful they are in fooling the critique. Formally, given a critique parametrized by Θ, a reference image i, and a generated caption c, the score is defined as the probability for the caption of being human-written, as assigned by the critique: scoreΘ(c, i) = P(c is human written | i, Θ). More generally, the reference image represents the context in which the generated caption is evaluated. To provide further information about the relevance and salience of the image content, a reference caption can additionally be supplied to the context. Let C(i) denotes the context of image i, then reference caption c could be included as part of context, i.e. c∈C(i). The score with context becomes scoreΘ(c, i) = P(c is human written | C(i), Θ).

- To systematically create pathological sentences, the authors define several transformations to generate unnatural sentences that might get high scores in an evaluation metric. Their proposed data augmentation scheme uses these transformations to generate large number of negative examples. Formally, a transformation Τ takes an image-caption dataset and generates a new one: Τ({(c, i) ∈ D}; γ) = {(c1', i1'), ..., (cn', in')}, where i, ii' are images, c, ci' are captions, D is a list of caption-image tuples representing the original dataset, and γ is a hyper-parameter that controls the strength of the transformation. Specifically, authors define following three transformations to generate pathological image-captions pairs:
- Random Captions ( RC ): To ensure the metric pays attention to the image content, they randomly sample human written captions from other images in the training set: TRC(D; γ) = {(c', i) | (c, i), (c', i') ∈ D, i'∈Nγ(i)}, where Nγ(i) represents the set of images that are top γ percent nearest neighbors to image i.
- Word Permutation ( WP ): To make sure that their metric pays attention to sentence structure, authors randomly permute at least 2 words in the reference caption: TWP(D; γ) = {(c', i) | (c, i) ∈ D, c' ∈ Pγ(c) \ {c}}, where Pγ(c) represents all sentences generated by permuting γ percent of words in caption c.
- Random Word ( RW ): To explore rare words authors replace from 2 to all words of the reference caption with random words from the vocabulary: TRW(D; γ) = {(c', i) | (c, i) ∈ D, c' ∈ Wγ(c) \ {c}}, where Wγ(c) represents all sentences generated by randomly replacing γ percent words from caption c.

- The authors propose a systematic approach to measure the robustness of an evaluation metric to a given pathological transformation.
General Points:
- Commonly used evaluation metrics for Image Captioning: BLEU, METEOR, ROUGE, CIDEr, SPICE. These metrics face two challenges. Firstly, many metrics fail to correlate well with human judgments. Metrics based on measuring word overlap between candidate and reference captions find it difficult to capture semantic meaning of a sentence, therefore often lead to bad correlation with human judgments. Secondly, each evaluation metric has its well-known blind spot, and rule-based metrics are often inflexible to be responsive to new pathological cases.
- Compact Bilinear Pooling ( CBP ) has been demonstrated in Multimodal compact bilinear pooling for visual question answering and visual grounding to be very effective in combining heterogeneous information of image and text.
Paper Reading - Learning to Evaluate Image Captioning ( CVPR 2018 ) ★的更多相关文章
- Paper Reading - Convolutional Image Captioning ( CVPR 2018 )
Link of the Paper: https://arxiv.org/abs/1711.09151 Motivation: LSTM units are complex and inherentl ...
- Paper Reading - Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images ( ICCV 2015 )
Link of the Paper: https://arxiv.org/pdf/1504.06692.pdf Innovations: The authors propose the Novel V ...
- Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
- 读paper笔记[Learning to rank]
读paper笔记[Learning to rank] by Jiawang 选读paper: [1] Ranking by calibrated AdaBoost, R. Busa-Fekete, B ...
- 在矩池云上复现 CVPR 2018 LearningToCompare_FSL 环境
这是 CVPR 2018 的一篇少样本学习论文:Learning to Compare: Relation Network for Few-Shot Learning 源码地址:https://git ...
- 爬取CVPR 2018过程中遇到的坑
爬取 CVPR 2018 过程中遇到的坑 使用语言及模块 语言: Python 3.6.6 模块: re requests lxml bs4 过程 一开始都挺顺利的,先获取到所有文章的链接再逐个爬取获 ...
- Paper Reading - Convolutional Sequence to Sequence Learning ( CoRR 2017 ) ★
Link of the Paper: https://arxiv.org/abs/1705.03122 Motivation: Compared to recurrent layers, convol ...
- Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
- Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1412.2306 Main Points: An Alignment Model: Convolutional Ne ...
随机推荐
- admin源码解析及自定义stark组件
admin源码解析 单例模式 单例模式(Singleton Pattern)是一种常用的软件设计模式,该模式的主要目的是确保某一个类只有一个实例存在.当你希望在整个系统中,某个类只能出现一个实例时,单 ...
- 关于spring配置文件的头部编写
//普通spring配置文件模板1 <?xml version="1.0" encoding="UTF-8" ?> <beans xmlns: ...
- mui的openWindowWithTitle()参数及说明
mui.openWindowWithTitle({ url: 'xxx.html', //String类型,要打开的界面的地址 id: 'id', //String类型,要打开的界面的id style ...
- 上白泽慧音(tarjan,图的染色)
题目描述 在幻想乡,上白泽慧音是以知识渊博闻名的老师.春雪异变导致人间之里的很多道路都被大雪堵塞,使有的学生不能顺利地到达慧音所在的村庄.因此慧音决定换一个能够聚集最多人数的村庄作为新的教学地点.人间 ...
- acm--1004
问题描述 再次比赛时间!看到气球在四周漂浮,多么兴奋.但要告诉你一个秘密,评委最喜欢的时间是猜测最流行的问题.比赛结束后,他们会统计每种颜色的气球并找出结果. 今年,他们决定离开这个可爱的工作给你. ...
- Linux下安装 Redis
一.部署前准备 1.首先上官网下载Redis 最新稳定的压缩包 2.通过远程管理工具,将压缩包拷贝到Linux服务器中,执行解压操作 [root@CentOS6 ~]# tar zxvf redis- ...
- SpringBoot整合Mybatis,TypeAliases配置失败的问题
SpringBoot整合Mybatis,TypeAliases配置失败的问题 问题描述 在应用MyBatis时,使用对象关系映射,将对象和Aliase映射起来. 在Mybatis的文档明确写出,如果你 ...
- Tomcat性能监控
Tomcat性能监控工具很多,这里介绍两种1.JMeter 2.probe,使用这两种工具都需要在tomcat的安装目录/conf/tomcat-users.xml添加 <tomcat-user ...
- Python入门 —— 04字符串解析
字符串 -字符串是 Python 中最常用的数据类型.(可以说是大多数语言都常用) 1. 创建字符串 ( '' 或 "" 和 '''''')(单,双和三引号)(字符串可以为空) - ...
- 常用的JavaScript设计模式(二)Factory(工厂)模式
Factory通过提供一个通用的接口来创建对象,同时,我们还可以指定我们想要创建的对象实例的类型. 假设现在有一个汽车工厂VehicleFactory,支持创建Car和Truck类型的对象实例,现在需 ...