Image Paragraph论文合辑
A Hierarchical Approach for Generating Descriptive Image Paragraphs (CPVR 2017) Li Fei-Fei.
数据集地址: http://cs.stanford.edu/people/ranjaykrishna/im2p/index.html
Workflow:
1.decompose the input image by detecting objects and other regions of interest
2.aggregate features across these regions to produce a pooled representation richly expressing the image semantics
3.take this feature vector as input by a hierarchical recurrent neural network composed of two levels: a sentence RNN and a word RNN.
4.sentence RNN receives the image features ,decides how many sentences to generate in the resulting paragraph, and produce an input topic vector for each sentence.
5.word RNN use this topic vector to generate the words of a single sentence.
Region Detector:
CNN+RPN
resize image-->pass through a CNN to get feature maps-->region proposal network(RPN) process the resulting feature maps-->regions of interest are projected onto the convolutional feature maps-->the corresponding region of the feature map is resized to a fixed size using bilinear interpolation and processed by two fully-connected layers to give a vector of dimension D for each region.
Given a dataset of images and ground-truth regions of interest, the region detector can be trained end-to-end fashion for object detection and for dense captioning.
Region Pooling:
elementwise maximum, Wpool and bpool are learned parameters, vi stands for a set of vectors produced by the region detector.
Hierarchical Recurrent Network:
Why Hierachical?
1.It reduces the length of time over which the recurrent networks must reason.
2.the generated paragraphs contain numbers of sentences, both the paragraph and sentence RNNs need only reason over much shorter time-scales, making learning an appropriate representation much more tractable
Sentence RNN: take the pooled region vector vp as input and produce a sequence of hidden states h1,h2,...,hS one for each sentence in the paragraph. Each hidden state used in two ways, produce a distributin pi to determine whether to stop and produce the topic vector ti for the i-th sentence of the paragraph ,which is the input of the word RNN.
Word RNN: the same as the LSTM components in the image captionings.
Training and Sampling:
training loss l(x,y) for the example (x,y) is a weighted sum of the two cross-entropy terms: a sentence loss lsent on the stopping distribution pi , and a word loss lword on the word distribution pij
Experiments:
Recurrent Topic-Transition GAN for Visual Paragraph Generation (ICCV 2017)
Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric Xing
RTT-GAN
Towards Diverse and Natural Image Descriptions via a Conditional GAN (ICCV 2017)
Previous approaches, including both generation methods and evaluation metrics, primarily focus on the resemblance to the training samples.
Instead of emphasizing n-gram matching, we aim to improve the naturalness and diversity.
Generation.Under the MLE principle, the joint probability of a sentence is, to a large extent, determined by whether it contains the frequent n-grams from the training set.
When the generator yields a few of words that match the prefix of a frequent n-gram, the remaining words of that n-gram will likely be produced following the Markov chain.
Evaluation.Classical metrics include BLEU, and ROUGE, which respectively focuses on the precision and recall of n-grams. Beyond them, METEOR uses a combination of both the precison and the recall of n-grams. CIDEr uses weighted statistics over n-grams. As we can see, such metrics mostly rely on matching n-grams with the "groundtruths". As a result, sentences that contain frequent n-grams will get higher scores as compared to those using variant expressions. SPICE: Instead of matching between n-grams, it focues on those linguistic entities that reflect visual concepts (e.g. objects and relationships). However, other qualities, e.g. the naturalness of the expressions, are not considered in this metric.
The generator G takes two inputs: an image feature f(I) derived from a CNN and a ramdom vector z.
Diverse and Coherent Paragraph Generation from Images (ECCV 2018)
github: https://github.com/metro-smiles/CapG_RevG_Code
The authors propose to augment paragraph generation techniques with "coherence vectors," "global topic vectors," and modeling of the inherent ambiguity of associating paragraphs with images, via a variational auto-encoder formulation.
Topic Generation Net and Sentence Generation Net
Training for Diversity in Image Paragraph Captioning (EMNLP 2018)
github: https://github.com/lukemelas/image-paragraph-captioning
Image Paragraph论文合辑的更多相关文章
- Image Caption论文合辑2
说明: 这个合辑里面的论文不全是Image Caption, 但大多和Image Caption相关, 同时还有一些Workshop论文. Guiding Long-Short Term Memory ...
- Image Captioning 经典论文合辑
Image Caption: Automatically describing the content of an image domain:CV+NLP Category:(by myself, y ...
- Medical Image Report论文合辑
Learning to Read Chest X-Rays:Recurrent Neural Cascade Model for Automated Image Annotation (CVPR 20 ...
- 【Tips】史上最全H1B问题合辑——保持H1B身份终级篇
[Tips]史上最全H1B问题合辑——保持H1B身份终级篇 2015-04-10留学小助手留学小助手 留学小助手 微信号 liuxue_xiaozhushou 功能介绍 提供最真实全面的留学干货,帮您 ...
- SSH三大框架合辑的搭建步骤
v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VM ...
- 【OpenCV新手教程之十二】OpenCV边缘检測:Canny算子,Sobel算子,Laplace算子,Scharr滤波器合辑
本系列文章由@浅墨_毛星云 出品,转载请注明出处. 文章链接:http://blog.csdn.net/poem_qianmo/article/details/25560901 作者:毛星云(浅墨) ...
- 【OpenCV新手教程之十八】OpenCV仿射变换 & SURF特征点描写叙述合辑
本系列文章由@浅墨_毛星云 出品,转载请注明出处. 文章链接:http://blog.csdn.net/poem_qianmo/article/details/33320997 作者:毛星云(浅墨) ...
- 【OpenCV新手教程之十七】OpenCV重映射 & SURF特征点检測合辑
本系列文章由@浅墨_毛星云 出品.转载请注明出处. 文章链接:http://blog.csdn.net/poem_qianmo/article/details/30974513 作者:毛星云(浅墨) ...
- [OpenCV入门教程之十二】OpenCV边缘检测:Canny算子,Sobel算子,Laplace算子,Scharr滤波器合辑
http://blog.csdn.net/poem_qianmo/article/details/25560901 本系列文章由@浅墨_毛星云 出品,转载请注明出处. 文章链接:http://blog ...
随机推荐
- TensorFlow中卷积
CNN中的卷积核及TensorFlow中卷积的各种实现 声明: 1. 我和每一个应该看这篇博文的人一样,都是初学者,都是小菜鸟,我发布博文只是希望加深学习印象并与大家讨论. 2. 我不确定的地方用了“ ...
- 学习鸟哥的Linux私房菜笔记(1)——Linux系统入门
今天在阿里云申请了一个centos系统的云服务器,以前对linux了解的只是皮毛,记了几个命令还给忘了,整了半天都弄不好,作为一个做过javaweb开发的coder实在是惭愧啊,决定从今天开始学习Li ...
- 【BZOJ 1016】[JSOI2008]最小生成树计数(搜索+克鲁斯卡尔)
[题目链接]:http://www.lydsy.com/JudgeOnline/problem.php?id=1016 [题意] [题解] /* 两个最小生成树T和T'; 它们各个边权的边的数目肯定是 ...
- Swift学习——Swift解释具体的基础(六)
Optionals 可选 可选(它似乎并不如此翻译)它适用于那些值这种情况可能是空的,有两种情况一个可选:存在值并等于x,要么值不存在. 选配的概念在OC和C里面并没有.在OC中最接近的概念就是 ...
- CORDOVA :添加cordova-plugin-file-opener2插件cordova打包报错
原文:CORDOVA :添加cordova-plugin-file-opener2插件cordova打包报错 最近在接触android项目,其中涉及到APP自动更新的问题,当新APP下载成功后需要打开 ...
- Android中使用sqlite3操作SQLite
SQLite库包含一个名字叫做sqlite3的命令行,它可以让用户手工输入并执行面向SQLite数据库的SQL命令.本文档提供一个样使用sqlite3的简要说明. 一.创建数据库: 1.将sqlit ...
- CentOS6.5系统挂载NTFS分区的移动硬盘
CentOS6.5系统挂载NTFS分区的移动硬盘 作为IT的工作者,避免不了使用Linux系统,我如今使用的系统是CentOS6.5 X86_64位版本号,可是插入NTFS移动硬盘没有办法识别.通过以 ...
- 学习 NLP(一)—— TF-IDF
TF-IDF(Term Frequency & Inverse Document Frequency),是一种用于信息检索与数据挖掘的常用加权技术.它的主要思想是:如果某个词或短语在一篇文章中 ...
- CMake生成OpenCV解决方案&&编译OpenCV源码
生成OpenCV工程需要用到CMake,所以第一步需要下载CMake软件,下载链接:CMake下载 目前最新的版本是3.7.1,这里选择下载Platform下的Windows win32-x86 ZI ...
- 让你编写的控件库在 XAML 中有一个统一的漂亮的命名空间(xmlns)和命名空间前缀
原文 让你编写的控件库在 XAML 中有一个统一的漂亮的命名空间(xmlns)和命名空间前缀 在 WPF XAML 中使用自己定义的控件时,想必大家都能在 XAML 中编写出这个控件的命名空间了.然而 ...