Paper Reading - Learning to Evaluate Image Captioning ( CVPR 2018 ) ★

Link of the Paper: https://arxiv.org/abs/1806.06422

Innovations:

The authors propose a novel learning based discriminative evaluation metric that is directly trained to distinguish between human and machine-generated captions. They train an automatic critique to distinguish generated captions from human-written ones, and then score candidate captions by how successful they are in fooling the critique. Formally, given a critique parametrized by Θ, a reference image i, and a generated caption c, the score is defined as the probability for the caption of being human-written, as assigned by the critique: score_Θ(c, i) = P(c is human written | i, Θ). More generally, the reference image represents the context in which the generated caption is evaluated. To provide further information about the relevance and salience of the image content, a reference caption can additionally be supplied to the context. Let C(i) denotes the context of image i, then reference caption c could be included as part of context, i.e. c∈C(i). The score with context becomes score_Θ(c, i) = P(c is human written | C(i), Θ).

To systematically create pathological sentences, the authors define several transformations to generate unnatural sentences that might get high scores in an evaluation metric. Their proposed data augmentation scheme uses these transformations to generate large number of negative examples. Formally, a transformation Τ takes an image-caption dataset and generates a new one: Τ({(c, i) ∈ D}; γ) = {(c₁', i₁'), ..., (c_n', i_n')}, where i, i_i' are images, c, c_i' are captions, D is a list of caption-image tuples representing the original dataset, and γ is a hyper-parameter that controls the strength of the transformation. Specifically, authors define following three transformations to generate pathological image-captions pairs:
- Random Captions ( RC ): To ensure the metric pays attention to the image content, they randomly sample human written captions from other images in the training set: T_RC(D; γ) = {(c', i) | (c, i), (c', i') ∈ D, i'∈N_γ(i)}, where N_γ(i) represents the set of images that are top γ percent nearest neighbors to image i.
- Word Permutation ( WP ): To make sure that their metric pays attention to sentence structure, authors randomly permute at least 2 words in the reference caption: T_WP(D; γ) = {(c', i) | (c, i) ∈ D, c' ∈ P_γ(c) \ {c}}, where P_γ(c) represents all sentences generated by permuting γ percent of words in caption c.
- Random Word ( RW ): To explore rare words authors replace from 2 to all words of the reference caption with random words from the vocabulary: T_RW(D; γ) = {(c', i) | (c, i) ∈ D, c' ∈ W_γ(c) \ {c}}, where W_γ(c) represents all sentences generated by randomly replacing γ percent words from caption c.

The authors propose a systematic approach to measure the robustness of an evaluation metric to a given pathological transformation.

General Points:

Commonly used evaluation metrics for Image Captioning: BLEU, METEOR, ROUGE, CIDEr, SPICE. These metrics face two challenges. Firstly, many metrics fail to correlate well with human judgments. Metrics based on measuring word overlap between candidate and reference captions find it difficult to capture semantic meaning of a sentence, therefore often lead to bad correlation with human judgments. Secondly, each evaluation metric has its well-known blind spot, and rule-based metrics are often inflexible to be responsive to new pathological cases.
Compact Bilinear Pooling ( CBP ) has been demonstrated in Multimodal compact bilinear pooling for visual question answering and visual grounding to be very effective in combining heterogeneous information of image and text.

Paper Reading - Learning to Evaluate Image Captioning ( CVPR 2018 ) ★的更多相关文章

Paper Reading - Convolutional Image Captioning ( CVPR 2018 )
Link of the Paper: https://arxiv.org/abs/1711.09151 Motivation: LSTM units are complex and inherentl ...
Paper Reading - Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images ( ICCV 2015 )
Link of the Paper: https://arxiv.org/pdf/1504.06692.pdf Innovations: The authors propose the Novel V ...
Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
读paper笔记[Learning to rank]
读paper笔记[Learning to rank] by Jiawang 选读paper: [1] Ranking by calibrated AdaBoost, R. Busa-Fekete, B ...
在矩池云上复现 CVPR 2018 LearningToCompare_FSL 环境
这是 CVPR 2018 的一篇少样本学习论文:Learning to Compare: Relation Network for Few-Shot Learning 源码地址:https://git ...
爬取CVPR 2018过程中遇到的坑
爬取 CVPR 2018 过程中遇到的坑使用语言及模块语言: Python 3.6.6 模块: re requests lxml bs4 过程一开始都挺顺利的,先获取到所有文章的链接再逐个爬取获 ...
Paper Reading - Convolutional Sequence to Sequence Learning ( CoRR 2017 ) ★
Link of the Paper: https://arxiv.org/abs/1705.03122 Motivation: Compared to recurrent layers, convol ...
Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1412.2306 Main Points: An Alignment Model: Convolutional Ne ...

随机推荐

Linux中文件I/O函数
一.lseek函数每个打开文件都有一个与其相关联的“当前文件偏移量”.它通常是一个非负整数,用以度量从文件开始处计算的字节数.通常,读.写操作都从当前文件偏移量处开始,并使偏移量增加所读写的字节数 ...
laravel 5.7 resources 本地化简体中文
使用方法: 新建目录[项目目录/resources/lang/zh] 按以下内容创建文件,并将内容复制到文件中修改 config/app.php 'locale' => 'zh', 'fall ...
chromium之message_pump_win之一
写了22篇博文,终于到这里了———— MessagePumpWin!!! MessagePumpWin这个类还是挺复杂的,可以分成好几部分.接下来分块分析从介绍看,MessagePumpWin 是M ...
git 码云使用记录
使用了码云的私有仓库. 一.首先下载安装git 安装完成后,在开始菜单里找到“Git”->“Git Bash”,蹦出一个类似命令行窗口的东西,就说明Git安装成功! 二.创建版本库什么是版本库 ...
大数据学习--day03(运算符、流程控制语句)
运算符.流程控制语句自增自减容易出错的地方: 扩展的赋值运算符 a+=b 等同于 a = a+b; 扩展的赋值运算符隐含了一个类型的强制转换 & && 有何区别 & ...
关于一个flask的服务接口实战(flask-migrate，flask-script，SQLAlchemy)
前言最近接到一个接收前端请求的需求,需要使用python编写,之前没有写过python,很多技术没有用过,在这里做一个学习记录,如有错误,请不了赐教. Flask Api文档管理使用Falsk A ...
centos7环境下ELK部署之elasticsearch
es部署:es只能用普通用户启动博客园首发,转载请注明出处:https://www.cnblogs.com/tzxxh/p/9435318.html 一.环境准备: 安装jdk1.8.创建普通用户 ...
BZOJ：2763-[JLOI2011]飞行路线（最短路分层图）
题目链接:https://www.lydsy.com/JudgeOnline/problem.php?id=2763 解题心得: 第一次见到分层最短路.其实题中说选择k条路径免费,那怎么选k条路径并没 ...
Hibernate-关系映射
1.为什么用Hibernate框架: java程序数据保存的变化: * 内存存在:java基础中, 数据保存在内存中,只在内存中暂时存在 * 文件保存:有io/流之后,数据可以保存到文件中 * 数据库 ...
mavn打外部配置jar包依赖
https://blog.csdn.net/pei19890521/article/details/80984707

Paper Reading - Learning to Evaluate Image Captioning ( CVPR 2018 ) ★

Paper Reading - Learning to Evaluate Image Captioning ( CVPR 2018 ) ★的更多相关文章

随机推荐

热门专题