这篇涉及到以下三篇论文:

Unpaired Image Captioning by Language Pivoting (ECCV 2018)

Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data (ECCV 2018)

Unsupervised Image Caption (CVPR 2019)

1. Unpaired Image Captioning by Language Pivoting (ECCV 2018)

Abstract

作者提出了一种通过语言枢轴(language pivoting)的方法来解决没有成对的图片和描述的image caption问题(unpaired image captioning problem)。

Our method can effectively capture the characteristics of an image captioner from the pivot language(Chinese) and align it to the target language (English) using another pivot-target (Chinese-English) sentence parallel corpus.

Introduction

由于encoder-decoder结构需要大量的image-caption pairs来训练,通常这样的大规模标记数据是难以获得的,研究人员开始思考通过非成对的数据或者是用半监督的方法来利用其他领域成对的标记数据来实现无监督学习的目的。在本文中,作者希望通过使用源语言——中文作为枢轴语言,来消除输入图片和目标语言——英文描述之间的间隔,这需要有图片——中文描述以及中文——英文两个成对的数据集,从而达到不需要有图片——英文描述成对数据集来实现图片到英文描述生成的目的。

作者说这种思想来源于机器翻译领域的相关研究,使用这种策略的机器翻译方法通常分为两步,首先将源语言翻译成枢轴语言,然后将枢轴语言翻译成目标语言。但是image caption与机器翻译又有很多不同的地方:1.image-Chinese caption和Chinese-English中句子的风格和词汇分布有很大区别;2.source-to-pivot转换的错误会传递到pivot-to-target

Use AIC-ICC and AIC-MT as the training datasets and two datasets (MSCOCO and Flickr30K) as the validation datasets

i: source image, x: pivot language sentence, y: target language, y_hat: ground truth captions in target language(对于这里的y_hat,是从MSCOCO训练集里面随机抽取的描述性语句(captions),用来训练下autoencoder)

这篇文章的思想比较容易理解,难点是把Image-to-Pivot和Pivot-to-Target联系起来,克服两个数据集语言风格和词汇分布不一致这两个问题。

2. Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data (ECCV 2018)

作者在这篇文章中指出,目前已有的caption模型倾向于复制训练集中的句子或短语,生成的描述通常是泛化和模板化的,缺乏生成区分性描述的能力。

基于GAN的caption模型可以提升句子的多样性,但在标准的评价指标上会有比较差的表现。

作者提出在Captioning Module上结合一个Self-retrieval Module,来达到generate discriminative captions的目的。

3. Unsupervised Image Caption (CVPR 2019)

这是一篇真正的无监督方法来做Image Caption的文章,不 rely on any labeled image sentence pairs

与Unsupervised Machine Translation相比,Unsupervised Image Caption任务更具挑战是因为图像和文本是两个不同的模态,有很大的差别。

模型由an image encoder, a sentence generator,a sentence discriminator组成。

Encoder:

普通的image encoder即可,作者采用的是Inception-V4

Generator:

由LSTM组成的decoder

Discriminator:

由LSTM来实现,用来distinguish whether a partial sentence is a real sentence from the corpus or is generated by the model.

Training:

由于do not have any paired image-sentence,就不能用有监督的方式来训练模型了,于是作者设计了三种目标函数来实现Unsupervised Image Captioning

Adversarial Caption Generation:

Visual Concept Distillation:

Bi-directional Image-Sentence Reconstruction:

Image Reconstruction: reconstruct the image features instead of the full image

Sentence Reconstruction: the discriminator can encode one sentence and project it into the common latent space, which can be viewed as one image representation related to the given sentence. The generator can reconstruct the sentence based on the obtained representation.

Integration:Generator:

Discriminator:

Initialization

It challenging to adequately train our image captioning model from scratch with the given unpaired data, need an initialization pipeline to pre-train the generator and discriminator.

For generator:

Firstly, build a concept dictionary consisting of the object classes in the OpenImages dataset

Second, train a concept-to-sentence(con2sen) model using the sentence corpus only

Third, detect the visual concepts in each image using the existing visual concept detector. Use the detected concepts and the concept-to-sentence model to generate a pseudo caption for each image

Fourth, train the generator with the pseudo image-caption pairs

For discriminator, initialized by training an adversarial sentence generation model on the sentence corpus.

Unpaired/Partially/Unsupervised Image Captioning的更多相关文章

  1. Image Captioning代码复现

    Image caption generation: https://github.com/eladhoffer/captionGen Simple encoder-decoder image capt ...

  2. ( 转) Awesome Image Captioning

    Awesome Image Captioning 2018-12-03 19:19:56 From: https://github.com/zhjohnchan/awesome-image-capti ...

  3. 《Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks》论文笔记

    Code Address:https://github.com/junyanz/CycleGAN. Abstract 引出Image Translating的概念(greyscale to color ...

  4. Image Captioning 经典论文合辑

    Image Caption: Automatically describing the content of an image domain:CV+NLP Category:(by myself, y ...

  5. Video Captioning 综述

    1.Unsupervised learning of video representations using LSTMs 方法:从先前的帧编码预测未来帧序列 相似于Sequence to sequen ...

  6. paper 124:【转载】无监督特征学习——Unsupervised feature learning and deep learning

    来源:http://blog.csdn.net/abcjennifer/article/details/7804962 无监督学习近年来很热,先后应用于computer vision, audio c ...

  7. Machine Learning Algorithms Study Notes(4)—无监督学习(unsupervised learning)

    1    Unsupervised Learning 1.1    k-means clustering algorithm 1.1.1    算法思想 1.1.2    k-means的不足之处 1 ...

  8. 论文笔记之:Deep Recurrent Q-Learning for Partially Observable MDPs

    Deep Recurrent Q-Learning for Partially Observable MDPs  摘要:DQN 的两个缺陷,分别是:limited memory 和 rely on b ...

  9. Unsupervised Classification - Sprawl Classification Algorithm

    Idea Points (data) in same cluster are near each others, or are connected by each others. So: For a ...

随机推荐

  1. 从 BM 到 RBM

    1. 拓扑结构上 如下图示,在拓扑结构上,RBM(受限的玻尔兹曼机)与 BM(玻尔兹曼机)的最大区别在于: RBM 取消了可见层的层内连接以及隐含层的层内连接,主要在于 BM 的层内连接使得其学习过程 ...

  2. innodb_data_file_path参数误修改解决方法

    如果innodb_data_file_path参数被修改乱了,不知道原来值的大小,这样启动是会报错的.知道原来大小方法: 方法一:操作系统下ls -l看到原来大小,直接用ls -l显示的大小,复制即可 ...

  3. Linux中vim编辑器莫名下方出现H的问题

    在使用vim编辑文件的时候,不知道自己是按了哪个快捷键,导致了,每次编辑文件时,下方命令行出现数字+H的格式命令,使得整个文件没法编辑,强制退出后进入不能解决问题,各种文件的编辑都不行,找不出原因,最 ...

  4. cordova 生成发行版apk,并添加证书 – 畅玩Coding

    原文:cordova 生成发行版apk,并添加证书 – 畅玩Coding 首先jdk生成证书. 1.进入jdk安装目录 D:\Java\jdk1.7.0\bin 2.执行命令 keytool -gen ...

  5. 【Codeforces Round #438 A】Bark to Unlock

    [链接]h在这里写链接 [题意] 在这里写题意 [题解] 枚举它是在连接处,还是就是整个字符串就好. [错的次数] 0 [反思] 在这了写反思 [代码] #include <bits/stdc+ ...

  6. 读取Jar包外面的配置文件

    版权声明:本文为博主原创文章,未经博主允许不得转载. https://blog.csdn.net/shenxiandashu/article/details/79193705 比较常用的方法是将pro ...

  7. Arcgis api for javascript学习笔记(4.6版本) - 二维MapView中的FeatureLayer显示标注

    4.6版本api的FeatureLayer中有提供 labelsVisible 和 labelingInfo 两个属性,设置这两个属性可以实现显示将属性中某个字段作为标注.但是这两个属性只针对三维Sc ...

  8. Excel创button宏调用

    今天,匆匆写了一个宏,但发现已被用来创建button开发工具菜单不见了. 在十分钟找Excel转了个遍,终究Excel通常使用在中的选项,首先Mark下一个,离开同样找不到鞋. 几个截面图.促进突然, ...

  9. gdal库集成MrSID库的做法

    作者:朱金灿 来源:http://blog.csdn.net/clever101 首先从Lizardtech网站:http://www.lizardtech.com/download/develope ...

  10. C# 程序内的类数量对程序启动的影响

    原文:C# 程序内的类数量对程序启动的影响 版权声明:博客已迁移到 http://lindexi.gitee.io 欢迎访问.如果当前博客图片看不到,请到 http://lindexi.gitee.i ...