Paper Reading - Learning to Evaluate Image Captioning ( CVPR 2018 ) ★
Link of the Paper: https://arxiv.org/abs/1806.06422
Innovations:
- The authors propose a novel learning based discriminative evaluation metric that is directly trained to distinguish between human and machine-generated captions. They train an automatic critique to distinguish generated captions from human-written ones, and then score candidate captions by how successful they are in fooling the critique. Formally, given a critique parametrized by Θ, a reference image i, and a generated caption c, the score is defined as the probability for the caption of being human-written, as assigned by the critique: scoreΘ(c, i) = P(c is human written | i, Θ). More generally, the reference image represents the context in which the generated caption is evaluated. To provide further information about the relevance and salience of the image content, a reference caption can additionally be supplied to the context. Let C(i) denotes the context of image i, then reference caption c could be included as part of context, i.e. c∈C(i). The score with context becomes scoreΘ(c, i) = P(c is human written | C(i), Θ).
- To systematically create pathological sentences, the authors define several transformations to generate unnatural sentences that might get high scores in an evaluation metric. Their proposed data augmentation scheme uses these transformations to generate large number of negative examples. Formally, a transformation Τ takes an image-caption dataset and generates a new one: Τ({(c, i) ∈ D}; γ) = {(c1', i1'), ..., (cn', in')}, where i, ii' are images, c, ci' are captions, D is a list of caption-image tuples representing the original dataset, and γ is a hyper-parameter that controls the strength of the transformation. Specifically, authors define following three transformations to generate pathological image-captions pairs:
- Random Captions ( RC ): To ensure the metric pays attention to the image content, they randomly sample human written captions from other images in the training set: TRC(D; γ) = {(c', i) | (c, i), (c', i') ∈ D, i'∈Nγ(i)}, where Nγ(i) represents the set of images that are top γ percent nearest neighbors to image i.
- Word Permutation ( WP ): To make sure that their metric pays attention to sentence structure, authors randomly permute at least 2 words in the reference caption: TWP(D; γ) = {(c', i) | (c, i) ∈ D, c' ∈ Pγ(c) \ {c}}, where Pγ(c) represents all sentences generated by permuting γ percent of words in caption c.
- Random Word ( RW ): To explore rare words authors replace from 2 to all words of the reference caption with random words from the vocabulary: TRW(D; γ) = {(c', i) | (c, i) ∈ D, c' ∈ Wγ(c) \ {c}}, where Wγ(c) represents all sentences generated by randomly replacing γ percent words from caption c.
- The authors propose a systematic approach to measure the robustness of an evaluation metric to a given pathological transformation.
General Points:
- Commonly used evaluation metrics for Image Captioning: BLEU, METEOR, ROUGE, CIDEr, SPICE. These metrics face two challenges. Firstly, many metrics fail to correlate well with human judgments. Metrics based on measuring word overlap between candidate and reference captions find it difficult to capture semantic meaning of a sentence, therefore often lead to bad correlation with human judgments. Secondly, each evaluation metric has its well-known blind spot, and rule-based metrics are often inflexible to be responsive to new pathological cases.
- Compact Bilinear Pooling ( CBP ) has been demonstrated in Multimodal compact bilinear pooling for visual question answering and visual grounding to be very effective in combining heterogeneous information of image and text.
Paper Reading - Learning to Evaluate Image Captioning ( CVPR 2018 ) ★的更多相关文章
- Paper Reading - Convolutional Image Captioning ( CVPR 2018 )
Link of the Paper: https://arxiv.org/abs/1711.09151 Motivation: LSTM units are complex and inherentl ...
- Paper Reading - Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images ( ICCV 2015 )
Link of the Paper: https://arxiv.org/pdf/1504.06692.pdf Innovations: The authors propose the Novel V ...
- Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
- 读paper笔记[Learning to rank]
读paper笔记[Learning to rank] by Jiawang 选读paper: [1] Ranking by calibrated AdaBoost, R. Busa-Fekete, B ...
- 在矩池云上复现 CVPR 2018 LearningToCompare_FSL 环境
这是 CVPR 2018 的一篇少样本学习论文:Learning to Compare: Relation Network for Few-Shot Learning 源码地址:https://git ...
- 爬取CVPR 2018过程中遇到的坑
爬取 CVPR 2018 过程中遇到的坑 使用语言及模块 语言: Python 3.6.6 模块: re requests lxml bs4 过程 一开始都挺顺利的,先获取到所有文章的链接再逐个爬取获 ...
- Paper Reading - Convolutional Sequence to Sequence Learning ( CoRR 2017 ) ★
Link of the Paper: https://arxiv.org/abs/1705.03122 Motivation: Compared to recurrent layers, convol ...
- Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
- Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1412.2306 Main Points: An Alignment Model: Convolutional Ne ...
随机推荐
- 由Oracle 11g SYSAUX 和 SYSTEM 表空间回收引发的联想
0x00--目的 整理一下以前一个SYSTEM表空间和SYSAUX表空间使用率达到99%上限的处理思路和相关知识点,好记性不如烂笔头 0x01--表空间使用率现状 通过查询可得知目前表空间使用情况如下 ...
- Vue中把从后端取出的时间进行截取
未截取前 截取后 方法: </div>{{times}}</div> export default{ data() { return { // getTime储存从服务器请求回 ...
- ubuntu安装flashplayer插件三步走
1.去官网下载flash;2.解压3.复制.so文件到~/.mozilla/plugins/
- Xdebug 备注
安装步骤: 查看自己的环境是否已安装 Xdebug ,查看方法:使用phpinfo(),搜索 Xdebug 如果没有 如图: 如果没有:下一步确定你的PHP版本信息: Xdebug下载地址 https ...
- spring-quartz 定时器 给targetMethod传递参数
今天在做一个项目的时候,要给一个定时器任务的执行方法传递参数,在网上找了一下资料,可以使用arguments参数: <bean id="subsidyJobDetail" ...
- CSS动画详解及transform、transition、translate的区别
刚看完一节慕课网的css动画,在此总结下 1. 先说下 transform.transition.translate的区别 transform 和 transition是css的2个属性,transl ...
- 干货分享:QQ群排名霸屏优化规则靠前的新技术
谈起QQ群排名的优化规则,很多人又爱又恨,原因很简单,爱他的都是引流效果是非常好的,通过关键词搜索排名好的技术,能排到全国默认前三,叫人怎能不爱他,恨的原因也恨简单,无论你的群完善的再怎么好,好像都无 ...
- STM32(12)——CAN
简介: CAN是Controller Area Network,是 ISO 国际标准化的串行通信协议. CAN 控制器根据两根线上的电位差来判断总线电平.总线电平分为显性电平和隐性电平,二者必居其一 ...
- MySQL集群-PXC搭建以及使用innobackupex工具进行全局备份和增量备份
环境:centos7 vm1:10.154.47.236 vm2:10.154.52.189 vm3:10.105.12.50 目的:pxc使用三个节点构建mysql集群,使用innobackupex ...
- python学习——面对对象进阶
一.isinstance和issubclass isinstance(obj,cls)检查是否obj是否是类 cls 的对象 class Foo: pass a = Foo() print(isins ...