Image Paragraph论文合辑
A Hierarchical Approach for Generating Descriptive Image Paragraphs (CPVR 2017) Li Fei-Fei.
数据集地址: http://cs.stanford.edu/people/ranjaykrishna/im2p/index.html
Workflow:
1.decompose the input image by detecting objects and other regions of interest
2.aggregate features across these regions to produce a pooled representation richly expressing the image semantics
3.take this feature vector as input by a hierarchical recurrent neural network composed of two levels: a sentence RNN and a word RNN.
4.sentence RNN receives the image features ,decides how many sentences to generate in the resulting paragraph, and produce an input topic vector for each sentence.
5.word RNN use this topic vector to generate the words of a single sentence.
Region Detector:
CNN+RPN
resize image-->pass through a CNN to get feature maps-->region proposal network(RPN) process the resulting feature maps-->regions of interest are projected onto the convolutional feature maps-->the corresponding region of the feature map is resized to a fixed size using bilinear interpolation and processed by two fully-connected layers to give a vector of dimension D for each region.
Given a dataset of images and ground-truth regions of interest, the region detector can be trained end-to-end fashion for object detection and for dense captioning.
Region Pooling:
elementwise maximum, Wpool and bpool are learned parameters, vi stands for a set of vectors produced by the region detector.
Hierarchical Recurrent Network:
Why Hierachical?
1.It reduces the length of time over which the recurrent networks must reason.
2.the generated paragraphs contain numbers of sentences, both the paragraph and sentence RNNs need only reason over much shorter time-scales, making learning an appropriate representation much more tractable
Sentence RNN: take the pooled region vector vp as input and produce a sequence of hidden states h1,h2,...,hS one for each sentence in the paragraph. Each hidden state used in two ways, produce a distributin pi to determine whether to stop and produce the topic vector ti for the i-th sentence of the paragraph ,which is the input of the word RNN.
Word RNN: the same as the LSTM components in the image captionings.
Training and Sampling:
training loss l(x,y) for the example (x,y) is a weighted sum of the two cross-entropy terms: a sentence loss lsent on the stopping distribution pi , and a word loss lword on the word distribution pij
Experiments:
Recurrent Topic-Transition GAN for Visual Paragraph Generation (ICCV 2017)
Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric Xing
RTT-GAN
Towards Diverse and Natural Image Descriptions via a Conditional GAN (ICCV 2017)
Previous approaches, including both generation methods and evaluation metrics, primarily focus on the resemblance to the training samples.
Instead of emphasizing n-gram matching, we aim to improve the naturalness and diversity.
Generation.Under the MLE principle, the joint probability of a sentence is, to a large extent, determined by whether it contains the frequent n-grams from the training set.
When the generator yields a few of words that match the prefix of a frequent n-gram, the remaining words of that n-gram will likely be produced following the Markov chain.
Evaluation.Classical metrics include BLEU, and ROUGE, which respectively focuses on the precision and recall of n-grams. Beyond them, METEOR uses a combination of both the precison and the recall of n-grams. CIDEr uses weighted statistics over n-grams. As we can see, such metrics mostly rely on matching n-grams with the "groundtruths". As a result, sentences that contain frequent n-grams will get higher scores as compared to those using variant expressions. SPICE: Instead of matching between n-grams, it focues on those linguistic entities that reflect visual concepts (e.g. objects and relationships). However, other qualities, e.g. the naturalness of the expressions, are not considered in this metric.
The generator G takes two inputs: an image feature f(I) derived from a CNN and a ramdom vector z.
Diverse and Coherent Paragraph Generation from Images (ECCV 2018)
github: https://github.com/metro-smiles/CapG_RevG_Code
The authors propose to augment paragraph generation techniques with "coherence vectors," "global topic vectors," and modeling of the inherent ambiguity of associating paragraphs with images, via a variational auto-encoder formulation.
Topic Generation Net and Sentence Generation Net
Training for Diversity in Image Paragraph Captioning (EMNLP 2018)
github: https://github.com/lukemelas/image-paragraph-captioning
Image Paragraph论文合辑的更多相关文章
- Image Caption论文合辑2
说明: 这个合辑里面的论文不全是Image Caption, 但大多和Image Caption相关, 同时还有一些Workshop论文. Guiding Long-Short Term Memory ...
- Image Captioning 经典论文合辑
Image Caption: Automatically describing the content of an image domain:CV+NLP Category:(by myself, y ...
- Medical Image Report论文合辑
Learning to Read Chest X-Rays:Recurrent Neural Cascade Model for Automated Image Annotation (CVPR 20 ...
- 【Tips】史上最全H1B问题合辑——保持H1B身份终级篇
[Tips]史上最全H1B问题合辑——保持H1B身份终级篇 2015-04-10留学小助手留学小助手 留学小助手 微信号 liuxue_xiaozhushou 功能介绍 提供最真实全面的留学干货,帮您 ...
- SSH三大框架合辑的搭建步骤
v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VM ...
- 【OpenCV新手教程之十二】OpenCV边缘检測:Canny算子,Sobel算子,Laplace算子,Scharr滤波器合辑
本系列文章由@浅墨_毛星云 出品,转载请注明出处. 文章链接:http://blog.csdn.net/poem_qianmo/article/details/25560901 作者:毛星云(浅墨) ...
- 【OpenCV新手教程之十八】OpenCV仿射变换 & SURF特征点描写叙述合辑
本系列文章由@浅墨_毛星云 出品,转载请注明出处. 文章链接:http://blog.csdn.net/poem_qianmo/article/details/33320997 作者:毛星云(浅墨) ...
- 【OpenCV新手教程之十七】OpenCV重映射 & SURF特征点检測合辑
本系列文章由@浅墨_毛星云 出品.转载请注明出处. 文章链接:http://blog.csdn.net/poem_qianmo/article/details/30974513 作者:毛星云(浅墨) ...
- [OpenCV入门教程之十二】OpenCV边缘检测:Canny算子,Sobel算子,Laplace算子,Scharr滤波器合辑
http://blog.csdn.net/poem_qianmo/article/details/25560901 本系列文章由@浅墨_毛星云 出品,转载请注明出处. 文章链接:http://blog ...
随机推荐
- mui常用功能链接地址
1.下拉刷新mui.pullToRefresh插件http://ask.dcloud.net.cn/article/12152.打包app权限列表http://ask.dcloud.net.cn/ar ...
- Ambari——大数据平台的搭建利器(一)
Ambari 跟 Hadoop 等开源软件一样,也是 Apache Software Foundation 中的一个项目,并且是**项目.目前最新的发布版本是 2.0.1,未来不久将发布 2.1 版本 ...
- erlang抽象码与basho的protobuf
erlang抽象码与basho的protobuf(一)使用 erlang抽象码与basho的protobuf(二)代码生成原理之词法与语法分析 erlang抽象码与basho的protobuf(三)代 ...
- 【BZOJ 1005】[HNOI2008]明明的烦恼(化简的另一种方法)
[题目链接]:http://www.lydsy.com/JudgeOnline/problem.php?id=1005 [题意] [题解] 题目和题解在上一篇; 这里 对 [(m^(n-2-tot)) ...
- python 爬取bilibili 视频信息
抓包时发现子菜单请求数据时一般需要rid,但的确存在一些如游戏->游戏赛事不使用rid,对于这种未进行处理,此外rid一般在主菜单的响应中,但有的如番剧这种,rid在子菜单的url中,此外返回的 ...
- 菜鸟学习Spring——60s利用JoinPoint获取參数的值和方法名称
一.概述 AOP的实现方法在上两篇博客中已经用了两种方法来实现如今的问题来了尽管我们利用AOP,那么client怎样信息传递?利用JoinPoint接口来实现client给详细实现类的传递參数. 二. ...
- ASP.NET Core框架的本质
源文章地址:http://www.cnblogs.com/artech/p/inside-asp-net-core-framework.html 1.从Hello World谈起 当我们最开始学习一门 ...
- delphi 中的函数指针 回调函数(传递函数指针,以及它需要的函数参数)
以下代码仅仅是测试代码:delphi XE7 UP1 interface uses Winapi.Windows, Winapi.Messages, System.SysUtils, System.V ...
- WPF 3D变换应用
WPF可以提供的3D模型使我们可以轻松地创建3D实体,虽然目前来看还很有一些性能上的问题,不过对于一些简单的3D应用应该是可取的,毕竟其开发效率高,而且也容易上手. 下面给大家演示的是使用在WPF 3 ...
- Qt 的几个核心机制总结之 布局(QWidget可以设置setSizePolicy,而QSizePolicy有Fixed,minimum,maximum,preferred,expanding,ignore等7个属性,还可以横竖分开)
1.Qt布局的作用 Qt的布局是通过布局管理器来实现的,布局管理器负责在父类窗口部件区域构建子窗口部件,使得放置在窗体中的每个窗口部件都有一个适合的大小和位置,并且能够随着应用程序本身的变化而变化从而 ...