Link of the Paper: https://arxiv.org/abs/1806.06422

Innovations:

  • The authors propose a novel learning based discriminative evaluation metric that is directly trained to distinguish between human and machine-generated captions. They train an automatic critique to distinguish generated captions from human-written ones, and then score candidate captions by how successful they are in fooling the critique. Formally, given a critique parametrized by Θ, a reference image i, and a generated caption c, the score is defined as the probability for the caption of being human-written, as assigned by the critique: scoreΘ(c, i) = P(c is human written | i, Θ). More generally, the reference image represents the context in which the generated caption is evaluated. To provide further information about the relevance and salience of the image content, a reference caption can additionally be supplied to the context. Let C(i) denotes the context of image i, then reference caption c could be included as part of context, i.e. cC(i). The score with context becomes scoreΘ(c, i) = P(c is human written | C(i), Θ).

    

  • To systematically create pathological sentences, the authors define several transformations to generate unnatural sentences that might get high scores in an evaluation metric. Their proposed data augmentation scheme uses these transformations to generate large number of negative examples. Formally, a transformation Τ takes an image-caption dataset and generates a new one: Τ({(c, i) ∈ D}; γ) = {(c1', i1'), ..., (cn', in')}, where i, ii' are images, c, ci' are captions, D is a list of caption-image tuples representing the original dataset, and γ is a hyper-parameter that controls the strength of the transformation. Specifically, authors define following three transformations to generate pathological image-captions pairs:

    • Random Captions ( RC ): To ensure the metric pays attention to the image content, they randomly sample human written captions from other images in the training set: TRC(D; γ) = {(c', i) | (c, i), (c', i') ∈ D, i'Nγ(i)}, where Nγ(i) represents the set of images that are top γ percent nearest neighbors to image i.
    • Word Permutation ( WP ): To make sure that their metric pays attention to sentence structure, authors randomly permute at least 2 words in the reference caption: TWP(D; γ) = {(c', i) | (c, i) ∈ D, c'Pγ(c) \ {c}}, where Pγ(c) represents all sentences generated by permuting γ percent of words in caption c.
    • Random Word ( RW ): To explore rare words authors replace from 2 to all words of the reference caption with random words from the vocabulary: TRW(D; γ) = {(c', i) | (c, i) ∈ D, c'Wγ(c) \ {c}}, where Wγ(c) represents all sentences generated by randomly replacing γ percent words from caption c.

  • The authors propose a systematic approach to measure the robustness of an evaluation metric to a given pathological transformation.

General Points:

  • Commonly used evaluation metrics for Image Captioning: BLEU, METEOR, ROUGE, CIDEr, SPICE. These metrics face two challenges. Firstly, many metrics fail to correlate well with human judgments. Metrics based on measuring word overlap between candidate and reference captions find it difficult to capture semantic meaning of a sentence, therefore often lead to bad correlation with human judgments. Secondly, each evaluation metric has its well-known blind spot, and rule-based metrics are often inflexible to be responsive to new pathological cases.
  • Compact Bilinear Pooling ( CBP ) has been demonstrated in Multimodal compact bilinear pooling for visual question answering and visual grounding to be very effective in combining heterogeneous information of image and text.

Paper Reading - Learning to Evaluate Image Captioning ( CVPR 2018 ) ★的更多相关文章

  1. Paper Reading - Convolutional Image Captioning ( CVPR 2018 )

    Link of the Paper: https://arxiv.org/abs/1711.09151 Motivation: LSTM units are complex and inherentl ...

  2. Paper Reading - Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images ( ICCV 2015 )

    Link of the Paper: https://arxiv.org/pdf/1504.06692.pdf Innovations: The authors propose the Novel V ...

  3. Paper Reading: Stereo DSO

    开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...

  4. 读paper笔记[Learning to rank]

    读paper笔记[Learning to rank] by Jiawang 选读paper: [1] Ranking by calibrated AdaBoost, R. Busa-Fekete, B ...

  5. 在矩池云上复现 CVPR 2018 LearningToCompare_FSL 环境

    这是 CVPR 2018 的一篇少样本学习论文:Learning to Compare: Relation Network for Few-Shot Learning 源码地址:https://git ...

  6. 爬取CVPR 2018过程中遇到的坑

    爬取 CVPR 2018 过程中遇到的坑 使用语言及模块 语言: Python 3.6.6 模块: re requests lxml bs4 过程 一开始都挺顺利的,先获取到所有文章的链接再逐个爬取获 ...

  7. Paper Reading - Convolutional Sequence to Sequence Learning ( CoRR 2017 ) ★

    Link of the Paper: https://arxiv.org/abs/1705.03122 Motivation: Compared to recurrent layers, convol ...

  8. Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★

    Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...

  9. Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )

    Link of the Paper: https://arxiv.org/abs/1412.2306 Main Points: An Alignment Model: Convolutional Ne ...

随机推荐

  1. ConfigurationManager.AppSettings方法

    一 配置文件概述: 应用程序配置文件是标准的 XML 文件,XML 标记和属性是区分大小写的.它是可以按需要更改的,开发人员可以使用配置文件来更改设置,而不必重编译应用程序.配置文件的根节点是conf ...

  2. CGAffineTransform 视频旋转(转)

    记录下视频旋转 ////////////////////////////////////////////// - (void)test:(NSURL *)url transformUrl:(NSURL ...

  3. python类的反射使用方法

    曾经,博主的房东养了只金毛叫奶茶,今天就拿它当议题好了. 博主写本文时正在被广州的蚊子围攻. #反射练习 class animal(object): def __init__(self,name,fo ...

  4. MySQL部分从库上面因为大量的临时表tmp_table造成慢查询

    背景描述 # Time: :: # User@Host: **[**] @ [**] Id: ** # Killed: # Query_time: Rows_examined: Rows_affect ...

  5. MySQL高可用架构故障自动转移插件MHA

    mha高可用架构是目前mysql高可用故障转移比较成熟的解决方案.MHA插件复杂监控mysql主节点的健康情况.在主节点宕机后,MHA把binlog通过ssh传到从节点进行重做补齐.并提升其中一个从节 ...

  6. Django学习笔记2

    1.BookInfo.objects.all() objects:是Manager类型的对象,用于与数据库进行交互 当定义模型类时没有指定管理器,则Django会为模型类提供一个名为objects的管 ...

  7. 新人成长之入门Vue.js弹窗Dialog介绍(二)

    前言 在上一篇博文中介绍了Vue.js的常用指令,今天总结归纳一下弹窗Dialog的使用,弹窗经常被使用在一些表单的增删改查啊,或者弹出一些提示信息等等,在保留当前页面状态的情况下,告知用户并承载相关 ...

  8. 使用NPOI将数据导出Excel

    NPOI.HSSF.UserModel.HSSFWorkbook book = new NPOI.HSSF.UserModel.HSSFWorkbook(); NPOI.SS.UserModel.IS ...

  9. 第3天 Java基础语法

    第3天 Java基础语法 今日内容介绍 引用数据数据类型(Scanner.Random) 流程控制语句(if.for.while.dowhile.break.continue) 引用数据类型 Scan ...

  10. RMI入门HelloWorld

    java RMI(Remote Method Invocation)是一种基于java远程调用技术,是对RPC的java实现,可以在不同主机上进行通信与方法调用.PRC通信原理如图: 方法调用从客户对 ...